What I Learned Building My Own Operating System

Why I Built It

LuisOS started as a way to understand computers beyond APIs. I wanted to see every layer:from bootloader to scheduler to file system:and make it work on real hardware constraints. I used the Pintos library as a base, but the meaningful parts (threads, file system, user programs) were built and tuned by hand.

Berkeley CS162 + AI-Native OS Mentorship

I took UC Berkeley CS162 with Ion Stoica and Matei Zaharia (Databricks co-founders), and later mentored under them on AI-optimized operating systems. Their work:Spark, Sky Lab’s AI-driven schedulers, and ADRS research:shaped how I think about kernels that learn. Instead of bolting ML onto Linux/Windows, the new wave builds AI in from the start: predictive schedulers, adaptive security, and agentic orchestration that cut latency for on-device inference on NPUs. Systems like Steve’s proactive maintenance agents or AthenaOS’s Rust-based swarms show how self-optimizing kernels can learn usage patterns and surface jazz-like complexity for future ML models. The Databricks view: what Spark did for distributed data, AI-native kernels can do at the endpoint:faster inference, better scheduling, and unified data/AI stacks for edge and enterprise clusters.

Architecture Decisions

I split the kernel into three planes: control (scheduling, policy), data (IPC, I/O paths), and learning (telemetry, reward signals). The learning plane logs scheduler decisions, cache hits/misses, IRQ storms, and page faults, then runs tiny inference on-device to bias policies. Early versions used simple heuristics; later versions swapped in a distilled model for predicting contention and prefetching pages before context switches. The hardest part was keeping the learning loop cheap enough to avoid ruining tail latency.

Scheduler Experiments

I benchmarked a baseline MLFQ against three variants:

Heuristic-aware: penalize lock-heavy threads; reward cache-friendly bursts.
Deadline-aware: soft EDF for audio/vision tasks with slack stealing.
Learned bias: a tiny model nudging priorities based on past contention windows.

The learned bias won on mixed workloads with NPUs + CPU contention: ~11% fewer tail spikes at p99 compared to MLFQ, with negligible CPU tax. On pure CPU workloads it regressed slightly, so I gated it behind a workload detector.

Kernel Threads & Scheduling

Preemption sounds easy; it's not. A few takeaways:

Context switching: saving FPU/SSE state right matters; missing it means ghost crashes.
Priority inversion: introduced priority donation to keep high-priority tasks from starving behind locks.
Timer granularity: coarse timers make the system feel sluggish; too fine grinds the CPU.

User Programs & Syscalls

The user/kernel boundary is where most bugs hid. I focused on:

Validating pointers defensively:never trust user space.
Clear error codes for syscalls; silent failures make debugging impossible.
Copy-on-write experiments improved performance but complicated page fault handling.

Security and Isolation

I added a lightweight capability model for syscalls exposed to untrusted user programs. Capabilities could be revoked, time-bounded, or rate-limited. Simple ptrace hooks plus guarded syscalls caught most user/kernel abuse during fuzzing. Memory tagging would be next, but on Pintos-era constraints I settled for guard pages and aggressive zeroing.

Virtual Memory & Files

Paging and the file system forced discipline:

Page replacement: a simple clock algorithm beat more complex heuristics under load.
Write-back vs. write-through: hybrid caching reduced I/O while keeping corruption risk low.
Atomicity: journaling-lite for metadata so crashes don't nuke the directory tree.

Device Model and NPUs

LuisOS treats NPUs as first-class schedulable devices. I built a tiny command queue with back-pressure so GPU/NPU kernels wouldn’t starve CPU threads. For transformer inference, batching small requests reduced end-to-end latency more than chasing kernel micro-optimizations. The OS’s job was orchestrating the right batching window without blocking interactive tasks.

Testing & Debugging

The biggest unlock was building a brutal test harness:

Deterministic repros: seedable workloads that hammer threads, syscalls, and I/O in parallel.
Panic breadcrumbs: short, consistent logs beat verbose dumps when you're low on time.
Fault injection: intentionally corrupting frames caught assumptions I'd never consider.

AI-Native Patterns I Want Next

Reward-model schedulers: RLHF-style signals on latency, jitter, and energy to tune policies.
Semantic I/O: tagging data flows (camera, mic, LIDAR) so the kernel can prioritize based on task graphs.
Adaptive security: anomaly detection on syscall patterns to auto-throttle suspicious processes.

What It Means for Builders

Building LuisOS changed how I approach products: measure everything, design for interrupts, and expect cross-layer effects. Modern apps are really distributed systems across CPU, GPU/NPU, storage, and network; the kernel mindset:tight loops, explicit tradeoffs:translates directly to shipping reliable AI products.

What I'd Do Differently

If I had a second pass:

Abstract the scheduler earlier; retrofitting different policies was painful.
Add richer tracing hooks from day one; printf debugging in kernel land is misery.
Invest in better tooling: symbolized stack traces and automated bisecting on kernel changes.

Takeaways

Building an OS is the fastest way to respect how much is hidden by modern runtimes. The work forces clarity: every allocation, every lock, every interrupt has a cost. That intuition now informs how I design higher-level systems:fewer assumptions, more instrumentation, and tight loops between design, measurement, and iteration.