Back to Home

Product Updates

Last updated: March 2026

v0.4.1

Latest
March 2026

Token ledger & accounting hardening

  • Added a token ledger section to billing settings to inspect usage events in one place.
  • Improved token management for prompt runs and evaluation batches to keep accounting accurate under edge cases.

v0.4.0

March 2026

Billing resilience & safer runs

  • Added automatic Pro trial subscription setup for new organizations, so teams can start evaluating immediately with sensible defaults.
  • Implemented token consumption and token balance services to enforce usage accounting and prevent runs when quota is exhausted.
  • Hardened evaluation error handling in key run hooks/providers, giving clearer failure feedback during prompt and eval workflows.
  • Improved privacy by masking request IDs in run statistics before rendering in the UI.

v0.3.0

March 2026

Evaluation intelligence

  • Introduced LLM-as-a-judge evaluations so you can score prompt outputs using model-based rubric checks instead of manual review.
  • Added first-class judge function support to make grading logic reusable across prompt versions and datasets.
  • Improved evaluation workflows with clearer run feedback and result visibility while comparing quality across models.

v0.2.1

February 2026

Launch & discovery

  • Launched the public homepage so anyone can understand what PromptEvals does before signing up.
  • Added a contact page for onboarding and plan questions — reach us directly without digging for an email.
  • Improved search engine presence with proper metadata, sitemap, and social previews.

v0.2.0

February 2026

Billing & workspaces

  • Subscribe to a plan and start a free trial directly in the app — no invoices, no back-and-forth.
  • Token-based billing: pay only for the LLM runs you actually make, tracked per billing period.
  • Organization workspaces keep your team's usage and billing cleanly separated.
  • A billing dashboard shows current usage, limits, and remaining quota at a glance.

v0.1.0

January 2026

Core platform

  • Write prompts with full version history — iterate freely and roll back to any previous version any time.
  • Run your prompt on OpenAI, Anthropic, and Gemini models and compare outputs side-by-side in one place.
  • Build an evaluation dataset and batch-test your prompt against every row at once — no manual copy-paste.
  • Bring Your Own API Key: your credentials, your costs, no markup.
  • Search across all your prompts instantly with full-text search.
  • Sign in with just your email via OTP — no passwords to manage.