All case studies

How We Cut Daemo.ai's Bedrock Bill with an AI Router

~40%

Reduction in Bedrock model spend

~$41K

Annualized savings

6

Security controls hardened with Pump Secure

Boost your global cloud visibility and control with Pump. Share your email for details!

By submitting your email, you agree to opt in to marketing emails.

Ready to start optimizing on your cloud spend?

By submitting your email, you agree to opt in to marketing emails.

Overview

"We were routing everything to our most expensive model. Pump's router sends each request to the right Claude model, cut our Bedrock spend by close to 40% with no drop in answer quality, and hardened the stack for our consulting customers at the same time."

Srikar Godilla

Co-founder

Daemo.ai is an AI-native startup building search infrastructure for consulting firms. Their product solves a problem anyone who has worked at a large legacy company will recognize: where is the document I actually need, and how do I find it without spending an hour digging through SharePoint? The answer, for Daemo, is a retrieval layer powered by Anthropic's Claude models hosted on Amazon Bedrock.

Like most AI-native companies, their Bedrock bill was climbing fast. They came to Pump for help getting it under control without compromising the answer quality their customers were paying for.

Industry

AI / Enterprise Search

Integrations

Location

Pump services

  • Pump Save

  • Pump Secure

Use Case 1

Building a Model Router in Front of Bedrock

Pump proposed putting a router between Daemo's application traffic and Bedrock that could classify each incoming request and send it to the right model. External clients hit an Amazon API Gateway endpoint, which invokes a Python Lambda function that serves as the entrypoint into the routing logic. The handler passes the request to a V2 Model Router, where a Task Classifier Lambda evaluates the complexity and intent of the request and selects between Claude Haiku, Claude Sonnet, and Claude Opus. The selected model is then invoked through the Bedrock Converse or ConverseStream API depending on whether the request needs streaming.

On the data side, every request and response is written as a structured JSON log to a dedicated S3 bucket for analysis and audit, and Bedrock metrics flow into CloudWatch cost alarms that fan out to SNS for email and Slack notifications. All AWS resources are deployed via CloudFormation, keeping the stack reproducible and manageable as code.

Use Case 2

Hardening the Infrastructure with Pump Secure

Daemo sells into consulting firms, which means their customers have specific expectations around data handling, encryption, and access controls. Pump Secure engaged in parallel with the router rollout so the new infrastructure would not introduce exposure that could slow down customer conversations.

Pump Secure covered six core control areas: tightening the S3 bucket policy to remove public access and restrict to VPC endpoints only, replacing the MongoDB Atlas open allowlist with a scoped EC2 IP allowlist, replacing wildcard full-access IAM policies with least-privilege equivalents, putting API Gateway behind WAF with API key auth and IP-based throttling, enforcing S3 SSE-KMS encryption at rest with TLS in transit, and moving compute into private subnets with restricted egress. Bundling this with the router meant Daemo got both improvements in a single coordinated effort rather than revisiting the architecture twice.

Pump’s impact

After the router went into production, Daemo's combined Sonnet and Opus spend on Bedrock dropped by close to 40 percent. The combined monthly run rate fell from roughly $8,656 to roughly $5,200, about $3,500 in monthly savings and an annualized run rate reduction of around $42,000. The bulk of those savings came from the router correctly identifying that a meaningful portion of Opus traffic could be served by Sonnet or Haiku with no drop in answer quality. The CloudWatch cost alarms have already caught two anomaly events that would otherwise have gone undetected until the next billing cycle.

Two lessons carried into the next engagement: Bedrock cost optimization is fundamentally different because unit cost per request is the only lever, and intelligent routing pays off faster than most customers expect, since request distribution is usually weighted toward queries that do not need top-tier reasoning. Cost optimization and security hardening also belong in the same engagement when the customer sells into regulated or compliance-sensitive industries.

Stop sending easy requests to your most expensive model

Daemo routes each request to the right Claude model and hardened the stack for its consulting-firm customers, all in one engagement.

Stop sending easy requests to your most expensive model

Daemo routes each request to the right Claude model and hardened the stack for its consulting-firm customers, all in one engagement.

Stop sending easy requests to your most expensive model

Daemo routes each request to the right Claude model and hardened the stack for its consulting-firm customers, all in one engagement.

Stop sending easy requests to your most expensive model

Daemo routes each request to the right Claude model and hardened the stack for its consulting-firm customers, all in one engagement.