Single-agent LLMs suck at long-running complex tasks.
We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.
How it works:
1. Orchestrator agent that manages task decomposition
2. Sub-agents for parallel work
3. Subscriptions to task state and progress
4. Real-time sharing of intermediate discoveries between agents
We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research.
It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.
Use it, break it, tell me about what workloads we should try and run next!
At some point the interesting question isn’t whether one agent or twenty agents can coordinate better, but which decisions we’re comfortable fully delegating versus which ones feel like they need a human checkpoint.
Multi-agent systems solve coordination and memory scaling, but they also make it easier to move further away from direct human oversight. I’m curious how people here think about where that boundary should sit — especially for tasks that have real downstream consequences.
reply