TPS means Tokens Per Second, indicating how quickly the model produces text and controls full response speed. Higher values mean more throughput and faster performance.

TheFastest.ai - AI Programming Freemium AI Tool

📋

Overview

This site positions itself as a performance measurement tool for Large Language Models, aiming to match the speed of human conversation. It is primarily used by developers and AI enthusiasts to compare latency and throughput across different model providers. The target audience includes engineers optimizing AI applications and researchers tracking model efficiency.

⚡

Core Features

Model Filtering: Users can filter models using text fields in the header to compare specific providers like Llama 3.1 or GPT-4.
TTFT Measurement: The tool measures Time To First Token to indicate how quickly the model begins outputting text.
TPS Tracking: Tokens Per Second are tracked to show how quickly the model produces text and controls response speed.
Total Time Calculation: The total time from request start to response completion is calculated to measure overall latency.
Daily Updates: Statistics are updated daily to ensure the performance data remains current and reliable.
Distributed Testing: Tests are run daily in multiple data centers using Fly.io to ensure a distributed footprint.
Outlier Removal: Three separate inferences are performed per provider, keeping the best result to remove outliers.
Open Data: All raw data is available in a public GCS bucket for transparency.

🚀

How to Use

Access the Site: Navigate to the website directly in your browser to view the latest benchmark stats.
Filter Models: Use the text fields in the header to filter specific models or providers you want to compare.
Review Metrics: Examine the TTFT, TPS, and Total time columns to understand model performance.
Check Methodology: Read the methodology section to understand how input and output tokens are standardized.
Request Benchmarks: File an issue on GitHub if you want a specific model to be benchmarked.
Access Raw Data: Download or view the raw data from the public GCS bucket if needed for analysis.

✨

Key Advantages

Daily Updates: Stats are updated daily, providing fresh data compared to static benchmark reports.
Transparent Methodology: The site clearly defines how TTFT, TPS, and Total time are measured and calculated.
Distributed Footprint: Testing occurs in multiple data centers (cdg, iad, sea) to reduce regional bias.
Outlier Management: The "Try 3, Keep 1" approach ensures results are not skewed by queuing or temporary latency spikes.
Open Source: The website source code and benchmarking tools are available on GitHub for verification.
Public Raw Data: Users can access the underlying data in a public GCS bucket for independent analysis.

💰

Pricing

Tier	Price	Description

❓

FAQ

What does TTFT stand for?▼

How is TPS defined?▼

How often are the statistics updated?▼

How do you handle outliers in testing?▼

Can I request a benchmark for a specific model?▼

What is the input token count for tests?▼

How many output tokens are generated?▼

Where are the tests run from?▼

🛟

Get Help

GitHub Issues: Users can file an issue on GitHub to request benchmarks for specific models.
Source Code: Full source code for the site is available on GitHub for technical reference.
Fixie Website: The tool is built by Fixie, and users can visit their main site for more context.
Raw Data Access: Users can access raw data in the public GCS bucket for analysis.
Benchmarking Tools: The full test suite is available in the ai-benchmarks repo on GitHub.

📥