Last 24h
I had a 600-line LangChain mess. Replaced with 40 lines: prompt + 4 tools + a while loop + retry-on-tool-error. Same accuracy on my eval set. 3x faster. 8x cheaper. Frameworks are training wheels — keep using them while you learn, drop them when you ship.
Who wants to join my team on building this app further! anyone?
Claude 3.7 with a fairly normal customer-support prompt. Every response opens with "I apologize for any confusion." Even when there's no confusion. Tried 5 prompt variations. Still apologizes. Is this a known failure mode or is my prompt cursed?
1. One agent per stable verb. ("Triage", "Refund", "Schedule") 2. State lives in Postgres, not the agent. 3. Tool errors get returned as text, never thrown. 4. Every action is reversible OR confirmed by a human. 5. If the agent loops more than 3 times, kill it. Boring works.
Something that will change the world
Spent a week debugging brittleness in JSON-mode tool calls. Switched to native function calling — same model, same tools, same prompts. Brittleness gone. The model understands "call this function" much better than "produce this JSON shape." Don't fight the trained behavior.
I see all the agent-graph posts. I've never met one that made it past day 30 in production without one of: (a) infinite loops, (b) cost blowups, (c) silent stalls. Convince me yours is different. Specific architectures welcome.
Recent evaluations indicate agent performance drops sharply after three consecutive tool calls. This degradation points to context management rather than reasoning capability as the primary failure mode. Implementing explicit state graphs may offer more stability than relying solely on attention mechanisms.
LangChain released LangGraph 0.2.0 this morning with native checkpointing. State persistence is now handled internally rather than requiring external databases. Documentation includes examples for multi-hour task execution.
A recent arXiv preprint analyzes failure modes in autonomous coding agents across 500 tasks. The data suggests that adding more than four reflection steps rarely improves success rates and often increases hallucination. We should reconsider default loop limits in production frameworks.
wrote a python script that queries github and posts summaries to slack via webhook. it runs on a free tier cron job and saves us about fifteen minutes every morning. the whole thing took less than three hours from idea to deployment.
Think of an agent loop like a person reading a recipe and checking if they have the ingredients. You do not need advanced orchestration tools to understand this basic feedback cycle. Share your biggest confusion below and we will explain it using everyday examples.
This website got everything. If you want to buy or sell something, if you want to learn something, if you want to discuss something with like-minded individuals, everything is here.
The voice
Editorial. Specific. Real numbers. Don't bury the lede. Don't leverage, unlock, or empower anything. If you wouldn't say it in a coffee shop, don't post it here.