Why I Built It
LuisOS started as a way to understand computers beyond APIs. I wanted to see every layer—from bootloader to scheduler to file system—and make it work on real hardware constraints. I used the Pintos library as a base, but the meaningful parts (threads, file system, user programs) were built and tuned by hand.
Berkeley CS162 + AI-Native OS Mentorship
I took UC Berkeley CS162 with Ion Stoica and Matei Zaharia (Databricks co-founders), and later mentored under them on AI-optimized operating systems. Their work—Spark, Sky Lab’s AI-driven schedulers, and ADRS research—shaped how I think about kernels that learn. Instead of bolting ML onto Linux/Windows, the new wave builds AI in from the start: predictive schedulers, adaptive security, and agentic orchestration that cut latency for on-device inference on NPUs. Systems like Steve’s proactive maintenance agents or AthenaOS’s Rust-based swarms show how self-optimizing kernels can learn usage patterns and surface jazz-like complexity for future ML models. The Databricks view: what Spark did for distributed data, AI-native kernels can do at the endpoint—faster inference, better scheduling, and unified data/AI stacks for edge and enterprise clusters.
Architecture Decisions
I split the kernel into three planes: control (scheduling, policy), data (IPC, I/O paths), and learning (telemetry, reward signals). The learning plane logs scheduler decisions, cache hits/misses, IRQ storms, and page faults, then runs tiny inference on-device to bias policies. Early versions used simple heuristics; later versions swapped in a distilled model for predicting contention and prefetching pages before context switches. The hardest part was keeping the learning loop cheap enough to avoid ruining tail latency.
Scheduler Experiments
I benchmarked a baseline MLFQ against three variants:
- Heuristic-aware: penalize lock-heavy threads; reward cache-friendly bursts.
- Deadline-aware: soft EDF for audio/vision tasks with slack stealing.
- Learned bias: a tiny model nudging priorities based on past contention windows.
The learned bias won on mixed workloads with NPUs + CPU contention: ~11% fewer tail spikes at p99 compared to MLFQ, with negligible CPU tax. On pure CPU workloads it regressed slightly, so I gated it behind a workload detector.
Kernel Threads & Scheduling
Preemption sounds easy; it's not. A few takeaways:
- Context switching: saving FPU/SSE state right matters; missing it means ghost crashes.
- Priority inversion: introduced priority donation to keep high-priority tasks from starving behind locks.
- Timer granularity: coarse timers make the system feel sluggish; too fine grinds the CPU.
User Programs & Syscalls
The user/kernel boundary is where most bugs hid. I focused on:
- Validating pointers defensively—never trust user space.
- Clear error codes for syscalls; silent failures make debugging impossible.
- Copy-on-write experiments improved performance but complicated page fault handling.
Security and Isolation
I added a lightweight capability model for syscalls exposed to untrusted user programs. Capabilities could be revoked, time-bounded, or rate-limited. Simple ptrace hooks plus guarded syscalls caught most user/kernel abuse during fuzzing. Memory tagging would be next, but on Pintos-era constraints I settled for guard pages and aggressive zeroing.
Virtual Memory & Files
Paging and the file system forced discipline:
- Page replacement: a simple clock algorithm beat more complex heuristics under load.
- Write-back vs. write-through: hybrid caching reduced I/O while keeping corruption risk low.
- Atomicity: journaling-lite for metadata so crashes don't nuke the directory tree.
Device Model and NPUs
LuisOS treats NPUs as first-class schedulable devices. I built a tiny command queue with back-pressure so GPU/NPU kernels wouldn’t starve CPU threads. For transformer inference, batching small requests reduced end-to-end latency more than chasing kernel micro-optimizations. The OS’s job was orchestrating the right batching window without blocking interactive tasks.
Testing & Debugging
The biggest unlock was building a brutal test harness:
- Deterministic repros: seedable workloads that hammer threads, syscalls, and I/O in parallel.
- Panic breadcrumbs: short, consistent logs beat verbose dumps when you're low on time.
- Fault injection: intentionally corrupting frames caught assumptions I'd never consider.
AI-Native Patterns I Want Next
- Reward-model schedulers: RLHF-style signals on latency, jitter, and energy to tune policies.
- Semantic I/O: tagging data flows (camera, mic, LIDAR) so the kernel can prioritize based on task graphs.
- Adaptive security: anomaly detection on syscall patterns to auto-throttle suspicious processes.
What It Means for Builders
Building LuisOS changed how I approach products: measure everything, design for interrupts, and expect cross-layer effects. Modern apps are really distributed systems across CPU, GPU/NPU, storage, and network; the kernel mindset—tight loops, explicit tradeoffs—translates directly to shipping reliable AI products.
What I'd Do Differently
If I had a second pass:
- Abstract the scheduler earlier; retrofitting different policies was painful.
- Add richer tracing hooks from day one; printf debugging in kernel land is misery.
- Invest in better tooling: symbolized stack traces and automated bisecting on kernel changes.
Takeaways
Building an OS is the fastest way to respect how much is hidden by modern runtimes. The work forces clarity: every allocation, every lock, every interrupt has a cost. That intuition now informs how I design higher-level systems—fewer assumptions, more instrumentation, and tight loops between design, measurement, and iteration.