Month-by-month cost creep usually hides in places teams stop watching after launch. The obvious model calls get attention, but the hidden spend often comes from repeated retrieval, unnecessary context growth, and premium models doing tasks that do not need them. A useful first step is to break cost into buckets instead of staring at one monthly number. That makes it easier to see whether the real problem is prompt size, traffic growth, retries, tool calls, or overuse of large models for routine work. Once the cost drivers are visible, optimization becomes much less random. Routing, caching, shorter prompts, and model tiering can all help, but they work best when the team knows exactly where the spend is leaking.AI infra cost slowly creeping up month by month not sure where to cut first
