Shared Session Support

Overview

LiteLLM now supports sharing aiohttp.ClientSession instances across multiple API calls to avoid creating unnecessary new sessions. This improves performance and resource utilization.

Usage

Basic Usage

import asyncio
from aiohttp import ClientSession
from litellm import acompletion

async def main():
    # Create a shared session
    async with ClientSession() as shared_session:
        # Use the same session for multiple calls
        response1 = await acompletion(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello"}],
            shared_session=shared_session
        )
        
        response2 = await acompletion(
            model="gpt-4o", 
            messages=[{"role": "user", "content": "How are you?"}],
            shared_session=shared_session
        )
        
        # Both calls reuse the same session!

asyncio.run(main())

Without Shared Session (Default)

import asyncio
from litellm import acompletion

async def main():
    # Each call creates a new session
    response1 = await acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    response2 = await acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "How are you?"}]
    )
    # Two separate sessions created

asyncio.run(main())

Benefits

Performance: Reuse HTTP connections across multiple calls
Resource Efficiency: Reduce memory and connection overhead
Better Control: Manage session lifecycle explicitly
Debugging: Easy to trace which calls use which sessions

Debug Logging

Enable debug logging to see session reuse in action:

import os
import litellm

# Enable debug logging
os.environ['LITELLM_LOG'] = 'DEBUG'

# You'll see logs like:
# 🔄 SHARED SESSION: acompletion called with shared_session (ID: 12345)
# ✅ SHARED SESSION: Reusing existing ClientSession (ID: 12345)

Common Patterns

FastAPI Integration

from fastapi import FastAPI
import aiohttp
import litellm

app = FastAPI()

@app.post("/chat")
async def chat(messages: list[dict]):
    # Create session per request
    async with aiohttp.ClientSession() as session:
        return await litellm.acompletion(
            model="gpt-4o",
            messages=messages,
            shared_session=session
        )

Batch Processing

import asyncio
from aiohttp import ClientSession
from litellm import acompletion

async def process_batch(messages_list):
    async with ClientSession() as shared_session:
        tasks = []
        for messages in messages_list:
            task = acompletion(
                model="gpt-4o",
                messages=messages,
                shared_session=shared_session
            )
            tasks.append(task)
        
        # All tasks use the same session
        results = await asyncio.gather(*tasks)
        return results

Custom Session Configuration

import aiohttp
import litellm

# Create optimized session
async with aiohttp.ClientSession(
    timeout=aiohttp.ClientTimeout(total=180),
    connector=aiohttp.TCPConnector(limit=300, limit_per_host=75)
) as shared_session:
    
    response = await litellm.acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        shared_session=shared_session
    )

Implementation Details

The shared_session parameter is threaded through the entire LiteLLM call chain:

acompletion() - Accepts shared_session parameter
BaseLLMHTTPHandler - Passes session to HTTP client creation
AsyncHTTPHandler - Uses existing session if provided
LiteLLMAiohttpTransport - Reuses the session for HTTP requests

Backward Compatibility

100% backward compatible - Existing code works unchanged
Optional parameter - shared_session=None by default
No breaking changes - All existing functionality preserved

Testing

Test the shared session functionality:

import asyncio
from aiohttp import ClientSession
from litellm import acompletion

async def test_shared_session():
    async with ClientSession() as session:
        print(f"✅ Created session: {id(session)}")
        
        try:
            response = await acompletion(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Hello"}],
                shared_session=session,
                api_key="your-api-key"
            )
            print(f"Response: {response.choices[0].message.content}")
        except Exception as e:
            print(f"✅ Expected error: {type(e).__name__}")
        
        print("✅ Session control working!")

asyncio.run(test_shared_session())

Files Modified

The shared session functionality was added to these files:

litellm/main.py - Added shared_session parameter to acompletion() and completion()
litellm/llms/custom_httpx/http_handler.py - Core session reuse logic
litellm/llms/custom_httpx/llm_http_handler.py - HTTP handler integration
litellm/llms/openai/openai.py - OpenAI provider integration
litellm/llms/openai/common_utils.py - OpenAI client creation
litellm/llms/azure/chat/o_series_handler.py - Azure O Series handler

Troubleshooting

Session Not Being Reused

Check debug logs: Enable LITELLM_LOG=DEBUG to see session reuse messages
Verify session is not closed: Ensure the session is still active when making calls
Check parameter passing: Make sure shared_session is passed to all acompletion() calls

Performance Issues

Session configuration: Tune aiohttp.ClientSession parameters for your use case
Connection limits: Adjust limit and limit_per_host in TCPConnector
Timeout settings: Configure appropriate timeouts for your environment

Overview​

Usage​

Basic Usage​

Without Shared Session (Default)​

Benefits​

Debug Logging​

Common Patterns​

FastAPI Integration​

Batch Processing​

Custom Session Configuration​

Implementation Details​

Backward Compatibility​

Testing​

Files Modified​

Troubleshooting​

Session Not Being Reused​

Performance Issues​