Product Updates

Last updated: March 2026

v0.4.1

Latest

March 2026

Token ledger & accounting hardening

Added a token ledger section to billing settings to inspect usage events in one place.
Improved token management for prompt runs and evaluation batches to keep accounting accurate under edge cases.

March 2026

Billing resilience & safer runs

Added automatic Pro trial subscription setup for new organizations, so teams can start evaluating immediately with sensible defaults.
Implemented token consumption and token balance services to enforce usage accounting and prevent runs when quota is exhausted.
Hardened evaluation error handling in key run hooks/providers, giving clearer failure feedback during prompt and eval workflows.
Improved privacy by masking request IDs in run statistics before rendering in the UI.

March 2026

Evaluation intelligence

Introduced LLM-as-a-judge evaluations so you can score prompt outputs using model-based rubric checks instead of manual review.
Added first-class judge function support to make grading logic reusable across prompt versions and datasets.
Improved evaluation workflows with clearer run feedback and result visibility while comparing quality across models.

February 2026

Launch & discovery

Launched the public homepage so anyone can understand what PromptEvals does before signing up.
Added a contact page for onboarding and plan questions — reach us directly without digging for an email.
Improved search engine presence with proper metadata, sitemap, and social previews.

February 2026

Billing & workspaces

Subscribe to a plan and start a free trial directly in the app — no invoices, no back-and-forth.
Token-based billing: pay only for the LLM runs you actually make, tracked per billing period.
Organization workspaces keep your team's usage and billing cleanly separated.
A billing dashboard shows current usage, limits, and remaining quota at a glance.

January 2026

Core platform

Write prompts with full version history — iterate freely and roll back to any previous version any time.
Run your prompt on OpenAI, Anthropic, and Gemini models and compare outputs side-by-side in one place.
Build an evaluation dataset and batch-test your prompt against every row at once — no manual copy-paste.
Bring Your Own API Key: your credentials, your costs, no markup.
Search across all your prompts instantly with full-text search.
Sign in with just your email via OTP — no passwords to manage.