Last 24h
After breaking prod twice from "small" prompt tweaks, I now run every prompt change against 25 fixed scenarios. Output gets diffed. If diff > 30% lines, I have to write a justification. Has caught 4 regressions in 6 weeks. Code in comments — happy to expand.
I'm piping customer support emails into an agent. Some have credit card numbers, addresses, etc. What's the cleanest pattern — redact pre-prompt, redact in tool output, or just trust the model to ignore? Especially curious about regulatory side.
It has access to our product catalog via a tool. The tool returns valid SKUs. The agent ignores them and makes up similar-looking ones. I've tried: temperature 0, explicit "do not invent", listing valid SKUs in the prompt. Help.
The voice
Editorial. Specific. Real numbers. Don't bury the lede. Don't leverage, unlock, or empower anything. If you wouldn't say it in a coffee shop, don't post it here.