# CostHawk (Full Context)

> Extended context for LLM systems consuming CostHawk product and documentation content.
> Canonical: https://costhawk.ai/llms-full.txt
> Last-Updated: 2026-03-20
> Language: en-US

CostHawk is a free platform for monitoring and optimizing AI API usage and spend.
It supports individual developers, teams, and organizations.
There are no trials, no credit cards, and no paid tiers. CostHawk is 100% free.

## Canonical Discovery Files

- [Robots](https://costhawk.ai/robots.txt)
- [Sitemap](https://costhawk.ai/sitemap.xml)
- [LLMs Index](https://costhawk.ai/llms.txt)

---

## What CostHawk Does

CostHawk is a comprehensive AI cost monitoring and optimization platform. It gives developers, teams, and organizations full visibility into what they spend on AI APIs, where the money goes, and how to spend less without sacrificing quality.

- Tracks AI provider usage and spend over time across 15 providers and hundreds of models
- Aggregates usage across providers, projects, teams, and individual API keys
- Compares retail API spend vs flat-rate subscription plans (Claude Max, OpenAI Pro, etc.)
- Supports alerts, dashboards, anomaly detection, and organization-level visibility
- Starts with local-first MCP telemetry so teams can track costs without sharing provider admin keys
- Provides wrapped proxy keys for per-request cost attribution without changing provider SDKs
- Admin API sync for org-wide visibility into OpenAI and Anthropic usage
- Open-source MCP server with 20 tools for querying cost data from any AI assistant
- Public leaderboard for opt-in team AI usage rankings

---

## Pricing

CostHawk is completely free. There are no paid plans, no trials, no credit card requirements, and no usage limits. Every feature — dashboards, alerts, MCP server, wrapped keys, admin API sync, anomaly detection — is available at no cost.

---

## Access and Data Paths

CostHawk offers three data ingestion paths, each adding progressively more attribution detail:

### 1. MCP Telemetry (Local-First)
- Install the `costhawk` npm package and connect it to your AI coding assistant (Claude Code, Codex, Cursor, etc.)
- Usage metadata is parsed from local session transcripts — no provider admin keys needed
- Tracks tokens, models, sessions, and estimated cost
- Privacy-first: only usage metadata is sent, never prompts, responses, or code
- [MCP Server Overview](https://docs.costhawk.ai/mcp-server/overview)

### 2. Admin API Sync (Organization-Wide)
- Connect your OpenAI or Anthropic admin API key to pull org-wide usage data
- See every team member's usage, broken down by model and time period
- No per-user setup required — one admin key covers the entire org
- [Admin API Keys](https://docs.costhawk.ai/connections/admin-api-keys)

### 3. Wrapped Proxy Keys (Per-Request Attribution)
- Generate CostHawk-managed proxy keys that route traffic through CostHawk's tracking layer
- Every request is tagged with cost, model, latency, and custom metadata
- The original provider key never leaves CostHawk's secure environment
- Compatible with any OpenAI-compatible SDK — just swap the base URL
- [Wrapped Keys](https://docs.costhawk.ai/connections/wrapped-keys)

---

## Feature Descriptions

### Usage Tracking
Real-time, multi-provider usage tracking across every model you use. CostHawk aggregates token consumption and cost data by provider, model, project, team, and time period. The dashboard shows daily, weekly, and monthly trends with drill-down into individual sessions and API calls. Usage data can be filtered by date range, provider, model, or custom tags.

### Savings Calculator
Flat-rate subscription plans (Claude Max, OpenAI Pro, ChatGPT Plus) offer unlimited or high-volume access for a fixed monthly fee. CostHawk compares your actual retail API cost against what you would pay on these plans, showing you whether you are saving money or overpaying. The savings breakdown shows per-model usage so you can see exactly which models drive your subscription ROI.

### Budget Alerts and Anomaly Detection
Set spending thresholds per project, team, or time period. CostHawk monitors your usage in real-time and triggers alerts when you approach or exceed your budget. Anomaly detection automatically identifies unusual spending patterns — sudden cost spikes, gradual drift upward, or per-key anomalies — before they become expensive surprises. Alerts can be delivered via email, Slack webhook, Discord webhook, Microsoft Teams, or PagerDuty.

### Wrapped Proxy Keys
Generate CostHawk-managed API keys that proxy traffic to your actual provider keys. Every request routed through a wrapped key is automatically tracked with full cost attribution — no SDK changes, no middleware, no code instrumentation. Just swap the base URL in your provider SDK configuration. Wrapped keys support policy controls including rate limiting and model allowlists.

### Admin API Sync
Connect your OpenAI or Anthropic organization admin API key to pull complete org-wide usage data. This gives finance and ops teams visibility into total AI spend across every team member without requiring individual setup. Data syncs automatically and includes per-user, per-model breakdowns.

### MCP Server
The CostHawk MCP server is an open-source npm package that connects any MCP-compatible AI assistant to your CostHawk cost data. It provides 20 tools for querying usage, savings, alerts, pricing, and more — all accessible through natural language in Claude Code, Codex, Cursor, Windsurf, or any other MCP host. Install with `npm exec --yes costhawk@latest -- --login`.

### Leaderboard
A public opt-in leaderboard where developers can compare their AI coding assistant usage. Shows Claude Code and Codex usage rankings. Participation is voluntary and can be toggled on or off at any time.

### OpenTelemetry Integration
CostHawk supports OpenTelemetry (OTLP) for teams that want to integrate AI cost data into their existing observability stack. Export traces, metrics, and cost metadata to any OTLP-compatible backend alongside your application telemetry.

---

## MCP Server Tools

The CostHawk MCP server (`costhawk` on npm) provides the following tools:

### Usage Tools
- **costhawk_get_usage_summary** — Get a summary of AI API usage and costs over a time period. Supports preset periods (last_24h, today, yesterday, last_7d, last_30d) or custom date ranges.
- **costhawk_get_usage_by_tag** — Break down costs by metadata tags like project, environment, or team. Supports date range filtering and result limits.

### Savings Tools
- **costhawk_get_savings** — Show savings vs retail pricing for flat-rate subscriptions like Claude Max or OpenAI Pro. Supports daily, weekly, or monthly periods.
- **costhawk_list_subscriptions** — List active flat-rate subscriptions configured in the account.
- **costhawk_get_savings_breakdown** — Get per-model usage and retail cost breakdown for savings analysis.

### Monitoring Tools
- **costhawk_detect_anomalies** — Find cost spikes and unusual activity patterns across all tracked usage.
- **costhawk_list_alerts** — View recent alerts including budget warnings, cost spikes, and anomaly notifications. Supports filtering by unread status.

### Webhook Tools
- **costhawk_list_webhooks** — List all configured webhooks for receiving cost alerts.
- **costhawk_create_webhook** — Create a new webhook for Slack, Discord, Microsoft Teams, PagerDuty, or custom HTTP endpoints. Subscribe to specific event types (cost_spike, budget_alert, anomaly, etc.).

### Pricing Tools
- **costhawk_get_model_pricing** — Look up current pricing for AI models across all tracked providers. Filter by provider name.

### Integration Tools
- **costhawk_list_integrations** — List connected provider integrations and their sync status.
- **costhawk_get_proxy_guide** — Get setup instructions for wrapped proxy keys.

### Local Sync Tools (Claude Code)
- **costhawk_sync_claude_code_usage** — Sync Claude Code usage from local transcript files and upload usage metadata to CostHawk. Supports dry-run preview.
- **costhawk_list_claude_code_sessions** — List Claude Code sessions discovered locally before syncing.
- **costhawk_get_local_claude_code_usage** — Compute Claude Code usage locally without uploading anything. Optionally estimate savings for a subscription plan.

### Local Sync Tools (Codex)
- **costhawk_sync_codex_usage** — Sync OpenAI Codex CLI usage from local session logs and upload usage metadata to CostHawk. Supports dry-run preview.
- **costhawk_list_codex_sessions** — List Codex sessions discovered locally before syncing.
- **costhawk_get_local_codex_usage** — Compute Codex usage locally without uploading anything.

### ROI Tools
- **costhawk_get_local_roi_report** — Generate a local ROI report combining usage data with productivity metrics.

### Privacy
When syncing local data, CostHawk only uploads usage metadata: session ID (hashed), project ID (hashed), model name, token counts (input, output, cache read, cache creation), and timestamp. No code, prompts, file paths, or project names are ever sent.

---

## Supported Providers

CostHawk tracks pricing and usage across the following AI providers:

| Provider | Label | Description |
|----------|-------|-------------|
| OpenAI | OpenAI | GPT-4o, GPT-4.1, o1, o3, o4-mini, and all OpenAI models. Flagship reasoning and chat models. |
| Anthropic | Anthropic | Claude Opus, Sonnet, Haiku family. Coding, chat, and high-context workloads. |
| Google | Google | Gemini family including Gemini 2.5 Pro, Flash, and multimodal models. |
| xAI | xAI / Grok | Grok models for teams comparing xAI pricing against the broader market. |
| Cohere | Cohere | Command and Embed models for enterprise RAG and search workloads. |
| Hugging Face | Hugging Face | Inference API and hosted model endpoints across open-source models. |
| Azure OpenAI | Azure OpenAI | Azure-hosted OpenAI models for teams on enterprise Azure contracts. |
| AWS Bedrock | AWS Bedrock | Bedrock-hosted model pricing for teams standardizing on AWS infrastructure. |
| Perplexity | Perplexity | Sonar models for search-augmented generation. |
| Groq | Groq | Ultra-low-latency inference for Llama, Mixtral, and other open models. |
| Mistral | Mistral | Mistral Large, Medium, and specialized models from Mistral AI. |
| DeepSeek | DeepSeek | DeepSeek models for cost-sensitive reasoning and coding workloads. |
| Together AI | Together AI | Open-source model hosting with competitive per-token pricing. |
| Meta | Meta | Llama family models when accessed through direct Meta endpoints. |
| Custom | Custom | Any custom or self-hosted model endpoint with manual pricing configuration. |

Provider pricing pages are published at `https://costhawk.ai/models/{provider-slug}` for each provider with active pricing data. Provider sections on the main catalog are anchor-addressable at `https://costhawk.ai/models#provider-{provider-slug}`. Model rows on provider pages are anchor-addressable at `https://costhawk.ai/models/{provider-slug}#model-{provider-slug}-{model-slug}`.

---

## Use Cases

### Individual Developers Tracking Personal AI Spend
Solo developers using Claude Code, Codex, Cursor, or direct API calls can install CostHawk's MCP server to automatically track every token and dollar spent. See whether your Claude Max subscription is actually saving you money, compare models by cost-effectiveness, and catch runaway API bills before they hit.

### Teams Needing Org-Wide Cost Visibility
Engineering managers and team leads can connect their OpenAI or Anthropic admin API key to see total org-wide AI spend broken down by user, model, and time period. No individual setup required — one admin key covers everyone. Budget alerts notify you when team spend approaches thresholds.

### Finance and Ops Teams Doing AI Cost Allocation
Finance teams can use CostHawk's tag-based cost breakdowns to allocate AI spend to specific projects, teams, cost centers, or customers. Wrapped proxy keys enable per-request attribution for chargeback and showback reporting. Export data via API for integration with existing FinOps tooling.

### Developers Comparing Flat-Rate Subscriptions vs Pay-Per-Token
Claude Max ($100-200/mo), OpenAI Pro ($200/mo), and other flat-rate plans promise unlimited or high-volume access. CostHawk calculates your actual retail API cost and shows you exactly how much you are saving (or overpaying) on each subscription, broken down by model and time period.

### AI Product Teams Managing Cost Per Query
Teams building AI-powered products need to understand the unit economics of every user query. CostHawk tracks cost-per-query across models and providers, helping you optimize model routing, implement prompt caching, and maintain healthy margins as usage scales.

### Platform Engineers Building AI Infrastructure
Platform teams standardizing AI access across the organization can use CostHawk's wrapped proxy keys as a lightweight AI gateway — adding cost tracking, rate limiting, and model allowlists without building custom middleware.

---

## Technical Architecture

### Local-First MCP Telemetry
CostHawk's primary data path is local-first. The MCP server reads usage data from local session transcripts (Claude Code's `~/.claude/` directory, Codex's session logs) and computes token counts, model usage, and estimated costs locally. Only aggregated usage metadata is uploaded — never prompts, responses, file paths, or code content. This means teams can track AI costs without sharing any provider credentials.

### Admin API Sync
For organization-wide visibility, CostHawk connects to OpenAI and Anthropic admin APIs using read-only admin keys. These sync jobs pull aggregated usage data (tokens, cost, model, user) on a schedule. Admin keys are encrypted at rest using AES-256-CBC with unique IVs and stored in CostHawk's PostgreSQL database. The sync process respects provider rate limits and includes circuit-breaker logic for API failures.

### Wrapped Proxy Keys
Wrapped keys are CostHawk-managed API keys that proxy requests to the actual provider. When an application sends a request using a wrapped key, CostHawk's proxy layer:
1. Decrypts the real provider key from secure storage
2. Forwards the request to the provider API
3. Records the full request metadata (model, tokens, latency, cost, custom tags)
4. Returns the provider's response unchanged

The proxy supports any OpenAI-compatible SDK — applications just change the base URL. No code changes, no middleware, no SDK wrappers.

### Privacy-First Design
CostHawk is built around a strict privacy boundary:
- **Collected**: Token counts, model names, session timestamps, cost calculations, latency metrics
- **Never collected**: Prompt content, response content, code, file paths, project names (hashed only), conversation content
- Session IDs and project IDs are hashed before transmission
- All data is encrypted in transit (TLS) and at rest (AES-256)
- Users can delete their data at any time

### Infrastructure
- **Application**: Next.js on Node.js, deployed on Railway
- **Database**: PostgreSQL on Supabase with Prisma ORM
- **Authentication**: Clerk with organization support
- **MCP Server**: Open-source Node.js package on npm (`costhawk`)
- **Encryption**: AES-256-CBC for API key storage, unique IV per encryption

---

## Product and Marketing Pages

- [Homepage](https://costhawk.ai): Product overview
- [AI Cost Glossary](https://costhawk.ai/glossary): 58-term glossary covering AI costs, LLM pricing, monitoring, infrastructure, and FinOps. Each term includes deep-dive explanation, real pricing data, code examples, and 8-10 FAQs with structured data (DefinedTerm + FAQPage + BreadcrumbList + Article schema)
- [Resources](https://costhawk.ai/resources): Hub for guides, documentation, and tools for managing AI API costs
- [Leaderboard](https://costhawk.ai/leaderboard): Public opt-in usage leaderboard
- [Security](https://costhawk.ai/security): Security posture, data handling, and trust boundary
- [Models](https://costhawk.ai/models): Supported providers, models, and tracking paths
- [OpenAI Pricing](https://costhawk.ai/models/openai): Dedicated OpenAI model pricing page
- [Anthropic Pricing](https://costhawk.ai/models/anthropic): Dedicated Anthropic model pricing page
- [Google Pricing](https://costhawk.ai/models/google): Dedicated Google model pricing page
- Provider pricing pages are published under `https://costhawk.ai/models/{provider-slug}` when active pricing exists
- [Sign Up](https://costhawk.ai/sign-up): Create a free account
- [Terms of Service](https://costhawk.ai/terms): Terms governing service usage
- [Privacy Policy](https://costhawk.ai/privacy): Privacy and data handling policy

---

## Primary Documentation

- [CostHawk Docs](https://docs.costhawk.ai): Docs home
- [Quickstart](https://docs.costhawk.ai/quickstart): Quick setup
- [REST API Reference](https://docs.costhawk.ai/api-reference): API endpoints and usage
- [MCP Server Overview](https://docs.costhawk.ai/mcp-server/overview): MCP architecture and setup
- [MCP Tools Reference](https://docs.costhawk.ai/mcp-server/tools): Complete reference for all 20 MCP tools
- [MCP Installation](https://docs.costhawk.ai/mcp-server/installation): Step-by-step MCP server installation
- [MCP Examples](https://docs.costhawk.ai/mcp-server/examples): Example queries and workflows
- [MCP Operations](https://docs.costhawk.ai/mcp-server/operations): Operational guidance for MCP server management

---

## Installation

- [npm package: costhawk](https://www.npmjs.com/package/costhawk): CLI install and login flow
- Install: `npm exec --yes costhawk@latest -- --login`
- The MCP server auto-configures for Claude Code, Cursor, Windsurf, and other MCP hosts

---

## AI Cost Glossary — All 58 Terms

The CostHawk AI Cost Glossary is a comprehensive reference covering billing, metering, optimization, infrastructure, observability, and FinOps concepts for AI APIs. Each term page includes a deep-dive explanation, real-world pricing data, code examples, and 8-10 FAQs with structured data markup.

### Billing & Pricing

- https://costhawk.ai/glossary/token — The fundamental billing unit for large language models. Every API call is metered in tokens, which are sub-word text fragments produced by BPE tokenization. Pricing is quoted per 1M tokens.
- https://costhawk.ai/glossary/token-pricing — The per-token cost model used by AI API providers, with separate rates for input tokens, output tokens, and cached tokens. Pricing varies by model tier and capability.
- https://costhawk.ai/glossary/input-vs-output-tokens — The two token directions in every LLM API call, each priced differently. Output tokens cost 3-5x more than input tokens across most providers.
- https://costhawk.ai/glossary/cost-per-query — The total cost of a single end-user request to your AI-powered application, including all token consumption, tool calls, and retrieval steps.
- https://costhawk.ai/glossary/cost-per-token — The unit price an AI provider charges for processing a single token, quoted per million tokens. Ranges from $0.075/1M to $60/1M across providers — an 800x spread.
- https://costhawk.ai/glossary/pay-per-token — The dominant usage-based pricing model for AI APIs where you pay only for the tokens you consume, with no upfront commitment or minimum spend.
- https://costhawk.ai/glossary/provisioned-throughput — Pre-purchased dedicated LLM compute capacity that guarantees consistent performance and can reduce per-token costs at scale.
- https://costhawk.ai/glossary/max-tokens — The API parameter that limits the maximum number of output tokens a model can generate in a single response, directly controlling cost ceiling per request.
- https://costhawk.ai/glossary/token-budget — Spending limits applied per project, team, or time period to prevent uncontrolled AI API costs and protect against runaway spend.
- https://costhawk.ai/glossary/roi — The financial return generated by AI investments relative to their total cost. AI ROI is uniquely challenging to measure because benefits are often indirect and distributed.
- https://costhawk.ai/glossary/tco — The complete, all-in cost of running AI in production over its full lifecycle. TCO extends far beyond API fees to include engineering, infrastructure, and operational overhead.
- https://costhawk.ai/glossary/unit-economics — The cost and revenue associated with a single unit of your AI-powered product — whether that unit is a query, a user session, or a completed task.
- https://costhawk.ai/glossary/chargeback-showback — Two complementary FinOps models for assigning AI cost accountability across teams and business units. Showback reports costs without billing; chargeback bills directly.

### Usage & Metering

- https://costhawk.ai/glossary/llm — A neural network with billions of parameters trained on massive text corpora to understand and generate human language. The core technology behind AI API billing.
- https://costhawk.ai/glossary/context-window — The maximum number of tokens a model can process in a single request, encompassing both the input prompt and the generated output.
- https://costhawk.ai/glossary/tokenization — The process of splitting raw text into discrete sub-word units called tokens using algorithms like Byte-Pair Encoding (BPE). How providers meter usage.
- https://costhawk.ai/glossary/embedding — A dense vector representation of text produced by a specialized neural network model. Embeddings capture semantic meaning and are priced separately from generation.
- https://costhawk.ai/glossary/inference — The process of running a trained model to generate predictions, classifications, or text output from new input. The billable event in AI API pricing.
- https://costhawk.ai/glossary/temperature — A sampling parameter (typically 0-2) that controls the randomness and creativity of LLM outputs. Does not directly affect cost but influences output length.
- https://costhawk.ai/glossary/rate-limiting — Provider-enforced caps on API requests and tokens per minute that throttle throughput and return HTTP 429 errors when exceeded.

### Optimization

- https://costhawk.ai/glossary/rag — An architecture pattern that combines a large language model with an external knowledge retrieval system. Reduces hallucination but adds retrieval and embedding costs.
- https://costhawk.ai/glossary/prompt-caching — A provider-side optimization that caches repeated prompt prefixes to reduce input token costs by 50-90% on subsequent requests with the same prefix.
- https://costhawk.ai/glossary/semantic-caching — An application-level caching strategy that uses embedding similarity to serve previously generated responses for semantically equivalent queries.
- https://costhawk.ai/glossary/batch-api — Asynchronous API endpoints that process large volumes of LLM requests at a 50% discount in exchange for longer turnaround times (typically 24 hours).
- https://costhawk.ai/glossary/model-routing — Dynamically directing AI requests to different models based on task complexity, cost constraints, and quality requirements to optimize the cost-quality tradeoff.
- https://costhawk.ai/glossary/prompt-compression — Techniques for reducing the token count of prompts while preserving semantic meaning — cutting input costs by 40-70% without degrading output quality.
- https://costhawk.ai/glossary/prompt-engineering — The practice of designing, structuring, and iterating on text inputs sent to large language models to elicit better outputs. Good prompts reduce token waste.
- https://costhawk.ai/glossary/fine-tuning — The process of further training a pre-trained large language model on a custom dataset to specialize it. Fine-tuned models can be cheaper per query by reducing prompt length.
- https://costhawk.ai/glossary/cost-anomaly-detection — Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become expensive problems.
- https://costhawk.ai/glossary/ai-cost-allocation — The practice of attributing AI API costs to specific teams, projects, features, or customers — enabling accountability and optimization.

### Infrastructure

- https://costhawk.ai/glossary/transformer — The foundational neural network architecture behind all modern large language models. Introduced in 2017, transformers use self-attention to process sequences in parallel.
- https://costhawk.ai/glossary/foundation-model — A large, general-purpose AI model pre-trained on broad data that serves as the base for downstream applications. Foundation models are the most expensive to train and host.
- https://costhawk.ai/glossary/multi-modal-model — An AI model capable of processing and generating content across multiple modalities — text, images, audio, and video. Multi-modal models have complex pricing tiers.
- https://costhawk.ai/glossary/agentic-ai — AI systems that autonomously plan, reason, and execute multi-step tasks by chaining multiple LLM calls, tool invocations, and decision loops. Agents multiply API costs.
- https://costhawk.ai/glossary/api-gateway — A centralized entry point for API traffic that handles routing, authentication, rate limiting, and request transformation for AI API calls.
- https://costhawk.ai/glossary/llm-gateway — An AI-specific API gateway purpose-built for routing LLM requests across providers. Adds model routing, cost tracking, and fallback capabilities.
- https://costhawk.ai/glossary/llm-proxy — A transparent intermediary that sits between your application and LLM providers, forwarding requests while adding tracking, caching, and policy controls.
- https://costhawk.ai/glossary/load-balancing — Distributing LLM API requests across multiple provider accounts, endpoints, or models to optimize for cost, latency, and availability.
- https://costhawk.ai/glossary/failover — Automatically switching to a backup LLM provider when the primary fails or becomes unavailable. Failover prevents user-facing outages but can increase costs.
- https://costhawk.ai/glossary/api-key-management — Securing, rotating, scoping, and tracking API credentials across AI providers. Effective key management is the foundation of AI cost control.
- https://costhawk.ai/glossary/wrapped-keys — Proxy API keys that route provider SDK traffic through a cost tracking layer. The original provider key never leaves the secure environment.
- https://costhawk.ai/glossary/webhook — An HTTP callback that pushes real-time notifications when events occur — cost threshold breaches, anomaly detection alerts, and usage milestones.
- https://costhawk.ai/glossary/gpu-instance — Cloud-hosted GPU hardware used for running LLM inference or training workloads. GPU instances represent the alternative to pay-per-token API pricing.
- https://costhawk.ai/glossary/serverless-inference — Running LLM inference without managing GPU infrastructure. Serverless platforms automatically provision hardware and bill per token or per request.

### Observability

- https://costhawk.ai/glossary/llm-observability — The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that affects cost, quality, and reliability.
- https://costhawk.ai/glossary/latency — The total elapsed time between sending a request to an LLM API and receiving the complete response. Latency decomposes into time-to-first-token and decode time.
- https://costhawk.ai/glossary/throughput — The volume of requests or tokens an LLM system processes per unit of time, measured as requests per second (RPS) or tokens per second (TPS).
- https://costhawk.ai/glossary/p95-p99-latency — Percentile latency metrics that capture the tail-end performance of LLM API calls. P95 means 95% of requests complete faster than this threshold.
- https://costhawk.ai/glossary/tracing — The practice of recording the full execution path of an LLM request — from prompt construction through model inference to response delivery.
- https://costhawk.ai/glossary/spans — Individual units of work within a distributed trace. Each span records a single operation such as an LLM call, a retrieval query, or a tool invocation.
- https://costhawk.ai/glossary/ttft — Time to First Token: the latency from sending an LLM API request to receiving the first token of the streamed response. A key UX metric for chat applications.
- https://costhawk.ai/glossary/tps — Tokens Per Second: the rate at which an LLM generates output tokens during the decode phase. TPS determines how fast streaming responses appear to the user.
- https://costhawk.ai/glossary/evals — Systematic evaluation of LLM output quality using automated metrics, human review, or LLM-as-judge methodologies. Evals connect cost optimization to quality outcomes.
- https://costhawk.ai/glossary/logging — Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, compliance, and cost analysis.
- https://costhawk.ai/glossary/dashboards — Visual interfaces for monitoring AI cost, usage, and performance metrics in real-time. The command center for AI cost management.
- https://costhawk.ai/glossary/alerting — Automated notifications triggered by cost thresholds, usage anomalies, or performance degradation in AI systems. The early warning system for cost overruns.
- https://costhawk.ai/glossary/opentelemetry — An open-source observability framework providing a vendor-neutral standard (OTLP) for collecting traces, metrics, and logs from AI applications.
- https://costhawk.ai/glossary/mcp — Model Context Protocol: an open protocol for connecting AI assistants to external tools and data sources via a standardized client-server architecture.

---

## Frequently Asked Questions

### What data does CostHawk collect?
CostHawk collects only usage metadata: token counts, model names, session timestamps, cost calculations, and latency metrics. It never collects prompt content, response content, code, file paths, or conversation content. Session IDs and project IDs are hashed before transmission.

### Does CostHawk work with my AI coding assistant?
Yes. CostHawk's MCP server works with any MCP-compatible host including Claude Code, OpenAI Codex, Cursor, Windsurf, and others. Local sync tools automatically detect and parse session transcripts.

### Do I need to share my API keys?
No. CostHawk's primary data path (MCP telemetry) works entirely from local session data — no API keys needed. Admin API sync and wrapped proxy keys are optional paths that require keys, but those keys are encrypted at rest and never exposed.

### Is CostHawk open source?
The CostHawk MCP server is open source and published on npm and GitHub. The CostHawk web application (dashboard, API, database) is proprietary.

### How accurate is the cost tracking?
CostHawk uses real-time pricing data from each provider's published rate cards. For MCP telemetry, costs are estimated from token counts and current model pricing. For wrapped proxy keys, costs are calculated from actual API responses. For admin API sync, costs come directly from the provider's usage reporting.

---

## Optional

- [GitHub (MCP Server)](https://github.com/cdilling/costhawk-mcp-server): Open-source repository