Notes on prompt engineering, regression testing, and shipping LLM features without breaking them.
How a GPT-4o to GPT-4.1 migration silently broke an extraction prompt — and the 6-line setup that would have caught it.
Six frontier models benchmarked on JSON extraction. Haiku is the value pick; Gemini Flash is fast and wrong.
5-minute setup: pin prompts, write test cases, gate PRs with a GitHub Action. Stop shipping vibes.