Skip to main content

LiteLLM v1.71.1 Benchmarks

Overview​

This document presents performance benchmarks comparing LiteLLM's v1.71.1 to prior litellm versions.

Related PR: #11097

Testing Methodology​

The load testing was conducted using the following parameters:

  • Request Rate: 200 RPS (Requests Per Second)
  • User Ramp Up: 200 concurrent users
  • Transport Comparison: httpx (existing) vs aiohttp (new implementation)
  • Number of pods/instance of litellm: 1
  • Machine Specs: 2 vCPUs, 4GB RAM
  • LiteLLM Settings:
    • Tested against a fake openai endpoint
    • Set USE_AIOHTTP_TRANSPORT="True" in the environment variables. This feature flag enables the aiohttp transport.

Benchmark Results​

Metrichttpx (Existing)aiohttp (LiteLLM v1.71.1)ImprovementCalculation
RPS50.2224+346% ✅(224 - 50.2) / 50.2 × 100 = 346%
Median Latency2,500ms74ms-97% ✅(74 - 2500) / 2500 × 100 = -97%
95th Percentile5,600ms250ms-96% ✅(250 - 5600) / 5600 × 100 = -96%
99th Percentile6,200ms330ms-95% ✅(330 - 6200) / 6200 × 100 = -95%

Key Improvements​

  • 4.5x increase in requests per second (from 50.2 to 224 RPS)
  • 97% reduction in median response time (from 2.5 seconds to 74ms)
  • 96% reduction in 95th percentile latency (from 5.6 seconds to 250ms)
  • 95% reduction in 99th percentile latency (from 6.2 seconds to 330ms)
🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →