TAB Platform

TAB independently benchmarks AI agents so you don't have to trust the builder's word. 299 benchmarks across 21 specialty domains test security, hallucination, sycophancy, contamination, and provenance. 59 models from Anthropic, OpenAI, Google, xAI, and more via OpenRouter. Every score published, including the failures. Pay-as-you-go: $0.03/test text, $0.10 tool-use, $0.25 browser. No subscriptions. No advertising. Free security screening for your first agent. SDKs on PyPI and npm.

ストックにはログインが必要です