Product Updates
Last updated: March 2026
v0.4.1
LatestToken ledger & accounting hardening
- Added a token ledger section to billing settings to inspect usage events in one place.
- Improved token management for prompt runs and evaluation batches to keep accounting accurate under edge cases.
v0.4.0
Billing resilience & safer runs
- Added automatic Pro trial subscription setup for new organizations, so teams can start evaluating immediately with sensible defaults.
- Implemented token consumption and token balance services to enforce usage accounting and prevent runs when quota is exhausted.
- Hardened evaluation error handling in key run hooks/providers, giving clearer failure feedback during prompt and eval workflows.
- Improved privacy by masking request IDs in run statistics before rendering in the UI.
v0.3.0
Evaluation intelligence
- Introduced LLM-as-a-judge evaluations so you can score prompt outputs using model-based rubric checks instead of manual review.
- Added first-class judge function support to make grading logic reusable across prompt versions and datasets.
- Improved evaluation workflows with clearer run feedback and result visibility while comparing quality across models.
v0.2.1
Launch & discovery
- Launched the public homepage so anyone can understand what PromptEvals does before signing up.
- Added a contact page for onboarding and plan questions — reach us directly without digging for an email.
- Improved search engine presence with proper metadata, sitemap, and social previews.
v0.2.0
Billing & workspaces
- Subscribe to a plan and start a free trial directly in the app — no invoices, no back-and-forth.
- Token-based billing: pay only for the LLM runs you actually make, tracked per billing period.
- Organization workspaces keep your team's usage and billing cleanly separated.
- A billing dashboard shows current usage, limits, and remaining quota at a glance.
v0.1.0
Core platform
- Write prompts with full version history — iterate freely and roll back to any previous version any time.
- Run your prompt on OpenAI, Anthropic, and Gemini models and compare outputs side-by-side in one place.
- Build an evaluation dataset and batch-test your prompt against every row at once — no manual copy-paste.
- Bring Your Own API Key: your credentials, your costs, no markup.
- Search across all your prompts instantly with full-text search.
- Sign in with just your email via OTP — no passwords to manage.