AI Platform Engineering

Building Lavc Systems: Architecture of a Local Multi-Agent AI Platform

Most AI tooling today assumes a cloud endpoint. Lavc Systems starts from a different premise: the entire execution stack — LLM, agents, memory, RAG, scheduler, and observability — runs on the developer's own machine. This post breaks down the engineering decisions that make that possible and the trade-offs each technology choice carries.

Published May 12, 2026 14 min read AI Platform Engineering

Why local-first matters for agent systems

Running an AI agent system in the cloud works well when you need to serve external users at scale. It works poorly when the data you need to reason over is internal, sensitive, or only valuable in aggregate on your own infrastructure. A company's task history, internal documents, API credentials, and operational logs should not leave the building just to run an AI query over them.

Lavc Systems was designed around this constraint. The system must be capable of reasoning over proprietary data, executing tasks, and maintaining memory across sessions — without any of that material touching a third-party server. Every technology choice in the stack flows from this principle.

The consequence is a tighter architecture than a cloud platform requires. You cannot assume infinite horizontal scaling. You must be deliberate about what stays in memory, what goes to disk, and what gets indexed in a vector database. Local-first AI infrastructure is not a simplified version of cloud AI — it is a different engineering discipline.

FastAPI as the backbone: why not Node.js

The backend runs on FastAPI with Python 3.14 and Uvicorn. The choice of Python over Node.js here is not primarily about performance — it is about ecosystem proximity. The libraries that matter most for agent systems — LangGraph, ChromaDB, HuggingFace, APScheduler, SQLAlchemy — are all Python-native. Building a Node.js backend and calling into Python services adds a process boundary that complicates state management and debugging without adding meaningful value.

FastAPI's async support maps naturally onto the execution model of an agent system where many operations are I/O-bound: waiting for a model response, reading from ChromaDB, querying SQLite. The modular router structure lets each domain — tasks, agents, documents, scheduler, objectives, observability — own its own endpoints without coupling into a monolithic handler file.

FastAPI's router pattern is the backend equivalent of a well-scoped module boundary: each router handles one domain, imports what it needs, and does not touch what it does not own. At the scale of a local platform, this discipline prevents the inevitable "single handler that does everything" failure mode.

Ollama: the local LLM runtime

Ollama handles model serving locally, exposing a compatible API that the backend calls exactly as it would call a cloud provider. The default model is qwen2.5-coder:7b, chosen for its strong code reasoning and reasonable resource footprint on developer hardware. The architecture does not bind to this model — switching to a different Ollama-supported model is a configuration change, not a code change.

The important engineering decision here is what Ollama solves at the system level: it removes the network dependency from the inference path. When an agent is mid-task and needs to call the model, the round-trip is local. This makes latency predictable, removes the risk of rate limiting during intensive execution, and eliminates the surface area where internal data could be exposed to an external provider.

The cost is hardware. A 7B-parameter model running on CPU is slower than a cloud API call. On a machine with a capable GPU, the difference narrows significantly. The system's design accounts for this by batching LLM calls at the orchestration layer rather than making redundant calls at each agent step.

LangGraph: orchestration as a state machine

LangGraph is the orchestration layer. Rather than writing imperative agent loops in Python, LangGraph models the multi-agent workflow as a directed graph where nodes are agent steps and edges are conditional transitions. This has several practical consequences for a system like Lavc Systems.

First, the coordinator agent's decisions become explicit graph transitions rather than control flow in a function body. When the coordinator decides to delegate to the programmer agent, that delegation is a named edge in the graph, not a call buried inside a conditional block. This makes the execution trace readable and debuggable — the Agent Graph screen in the UI renders this live, showing which nodes are active, which have completed, and what events have been emitted.

Second, LangGraph's state management is composable. Each agent step receives the current graph state, modifies its relevant slice, and passes updated state to the next node. This means the context available to each agent is explicit and type-checked, not an ambient object threaded through a call stack. For an audit trail that surfaces to users as an operational trace, explicit state transitions are significantly easier to summarise than implicit shared state.

The operational trace users see — intent detected, context consulted, tool or action used, next step — is not generated by asking the model to summarise its own reasoning. It is derived from the actual graph state transitions logged during execution. This makes it auditable and accurate rather than a model-generated approximation.

ChromaDB and the RAG architecture

ChromaDB is the vector database backing the RAG layer. The design choice to use ChromaDB over alternatives like Qdrant or Weaviate comes down to operational simplicity: ChromaDB runs in-process and stores its data on local disk with no separate service to manage. For a local-first system where operational overhead is a primary constraint, this matters.

RAG in Lavc Systems serves two purposes that are worth distinguishing. The first is document retrieval: when an agent needs to reason over internal documents — uploaded PDFs, policy files, technical references — it queries ChromaDB with an embedding of the current task context and retrieves the top-k semantically relevant chunks. The second purpose is artefact feedback: results generated by agents during task execution can be uploaded back into ChromaDB, making the output of one task available as context for future tasks.

Embeddings are generated using HuggingFace's sentence-transformers library, again running locally. The embedding model runs once at startup and stays resident in memory for the session. This removes the need to call an embedding API for every document chunk query, which would reintroduce the latency and privacy concerns that justified local infrastructure in the first place.

HybridMemory: two stores with one interface

Agent memory in Lavc Systems uses a HybridMemory abstraction that unifies two different storage backends: SQLite for structured records and ChromaDB for semantic retrieval.

The split is deliberate. Structured records — task history, user interactions, specific facts about the system state — belong in SQLite because they need to be queried deterministically. "What tasks did the programmer agent complete this week" is a SQL query, not a semantic search. But "what context is most relevant to this task" is a semantic question, and that belongs in ChromaDB.

HybridMemory lets the agent layer use a single interface and routes the query to the appropriate store based on the query type. This prevents the common failure mode where teams choose one storage paradigm and then contort all queries to fit it: forcing semantic lookups into SQL LIKE searches, or shoehorning structured filters into vector similarity scores.

APScheduler and the autonomous goal layer

APScheduler handles recurring jobs — tasks that need to run on a schedule rather than on user demand. This is more consequential than it sounds for an agent system. A significant fraction of operational value comes not from interactive AI conversations but from tasks that run reliably in the background: daily report generation, data aggregation, document reindexing, system health checks.

Above APScheduler sits the GoalEngine, which manages higher-order objectives. Goals are not scheduled jobs — they are multi-task missions with associated urgency, importance, and deadline factors that generate a priority_score. A goal like "prepare the weekly executive report" might decompose into several tasks: pull recent documents, summarise risks and opportunities, generate a Markdown report, and create follow-up tasks in the Kanban.

The relationship between the scheduler, the goal engine, and the Kanban is the operational spine of the system. The Kanban tracks execution state. The scheduler fires recurrence. The goal engine provides the higher-level intent that shapes which tasks get created and in what sequence.

React, Vite, and Zustand on the frontend

The frontend is a React 18 application built with Vite and styled with Tailwind CSS. State management uses Zustand. These are conventional choices for a TypeScript-first React app, and the interest is less in why each was chosen and more in how they fit the real-time requirements of an agent monitoring interface.

The Agent Graph screen is the most technically demanding part of the frontend. It renders the live multi-agent workflow — nodes, edges, active states, event counts, and timeline — and updates in real time as the backend pushes WebSocket events. This requires a Zustand store that can receive partial updates from the WebSocket handler and trigger selective re-renders without re-rendering the entire graph on each event.

The Kanban board has a different requirement: optimistic updates. When a user drags a task card to a new column, the UI must reflect that change immediately, before the PATCH request to /api/tasks/{id}/move has completed. If the request fails, the card must roll back. This pattern — optimistic mutation with failure rollback — is managed in the Zustand task store rather than inside the component, keeping the component layer free of coordination logic.

WebSocket as the real-time event bus

The /ws endpoint carries all real-time events from the backend to the frontend: task status changes, agent step completions, log entries, system health updates, and scheduler job results. Using a single WebSocket connection rather than polling multiple endpoints reduces server load and gives the UI a consistent event model regardless of which module generated the event.

The backend manages the WebSocket connection in ws.py, which keeps the connection management logic separate from the router modules. Each router emits events by calling into a shared event emitter; the WebSocket handler subscribes to that emitter and forwards events to connected clients. This means adding a new event type — say, a new agent action in a future module — requires only emitting the event from the new router, with no changes to the WebSocket transport layer.

Security: users, audit, and the local vault

A platform that stores API credentials, executes tasks autonomously, and connects to external tools needs a coherent security model even when running locally. Lavc Systems includes user and company management with scoped access, an audit log that records who did what and when, and a local credential vault for storing API keys and secrets used by tools and plugins.

The vault is intentionally not a cloud secrets manager. Storing credentials locally in a vault that the agent layer can read at runtime keeps the secrets off any external service while still enabling the automation that requires them. The trade-off is that local vault security is only as strong as the machine it runs on — which is an acceptable boundary for a system explicitly designed for on-premise operation.

Skyler: the same assistant, a different execution environment

Skyler appears in two places in the Lavc portfolio. The public version — Skyler Assistant — runs on Cloudflare Workers with manual RAG, provider fallback, and a lightweight orchestration model designed for public portfolio visitors. It answers questions about Patrick's work, stack, and experience. It does not have access to private data. It does not execute tasks. It is a read-only conversational interface over a fixed knowledge base.

Skyler inside Lavc Systems is a different implementation in the same conceptual role. She runs inside the FastAPI backend, has access to the SQLite database, can query ChromaDB, can create tasks, read uploaded files, write credentials to the local vault, and generate an operational trace of her own responses. The trace is not model-generated prose about what the model thought — it is a structured log of the actual state transitions executed during the response: intent detected, context source consulted, action taken, next recommended step.

The architectural lesson across both versions is that the quality of an AI assistant is not primarily determined by the model. It is determined by what data the assistant can reach, what actions it is permitted to take, and how transparent its reasoning is to the people relying on it. The public Skyler is useful because it has the right knowledge base and clear answer rules. The system Skyler is more capable because she has access to the full operational context of the platform she lives in.

Upgrading an AI assistant is not a model upgrade problem. It is an access and context problem. The same assistant design becomes more valuable as you expand what it can observe and what it is permitted to act on — while keeping the trace of that action readable to humans.

Observability as a design constraint

Agent systems fail in ways that are difficult to debug without structured observability. A task that produces a wrong result may have failed at the coordinator's planning step, the analyst's document retrieval step, the programmer's generation step, or the reviewer's validation step — and without trace data, the only signal is the output was wrong.

Lavc Systems treats observability as a first-class concern from the start. The observability module records traces, metrics, and execution events for every agent action. The Control screen exposes these as a dashboard. The task detail page shows the full execution log for a specific task, including which agents ran, what tools they called, what context they consulted, and how long each step took.

This is not primarily for debugging — it is for trust. An agent system that executes autonomously and produces results without any window into how it got there is difficult to rely on for consequential work. Observability is the mechanism by which the system earns the confidence to run unsupervised on increasingly important tasks.