Compare closed-source API spend against open models served per-token on Fireworks AI.
Savings are estimated from input/output token usage and an approximate cache hit rate. See the full math below.
Results are general estimates for internal discussion only. Fireworks model prices are from the published serverless per-token pricing (Standard serving path). Closed-source prices are representative examples. Fireworks does not guarantee any particular cost savings.
No black box. Here is exactly how the calculator turns your monthly spend into a projected savings figure on Fireworks.
Each model is billed on input, cached input, and output tokens (per 1M). The cache hit rate comes from your workload type.
We back out how many requests your current spend buys at the closed model's cost per request.
Hold that volume fixed, re-price it on the Fireworks model, and take the difference.
| Assumption | Value | Why |
|---|---|---|
| Cache hit rate — Chat & assistants | 50% | Long shared system prompt / context reused across turns |
| Cache hit rate — Document processing | 10% | Mostly unique input per request, little reuse |
| Cache hit rate — Agentic workflows | 80% | Many chained calls share a growing, stable prefix |
| Fireworks pricing | Standard path | Published per-token serverless pricing (input / cached / output per 1M) |
| Models without a cached rate | cached = input | No prompt-cache discount assumed for that model |