DAY 0 Support: Gemini 3.5 Flash on LiteLLM

May 19, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

LiteLLM now supports gemini-3.5-flash with full day 0 support!

note

If you only want cost tracking, you need no change in your current LiteLLM version. But if you want support for new features introduced with this release — thinking levels, strict function-call IDs, and thought signatures — upgrade to the latest LiteLLM release.

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-stable

pip install litellm
pip install litellm --upgrade

What's New

1. Minimal thinking level

Gemini 3.5 Flash supports the new "Minimal" level. LiteLLM maps OpenAI reasoning_effort to Gemini's thinkingLevel — use reasoning_effort="minimal".

SDK
PROXY

from litellm import completion

response = completion(
    model="gemini/gemini-3.5-flash",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    reasoning_effort="minimal",
)

print(response.choices[0].message.content)

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning_effort": "minimal"
  }'

`reasoning_effort`	`thinkingLevel`
`minimal`	`minimal`

2. Strict function calling

Gemini 3.5+ requires every functionResponse to include the same id as the originating functionCall, plus the matching function name. LiteLLM round-trips this through standard OpenAI fields: tool_calls[].id on the assistant message, and the same value as tool_call_id on the tool result.

How the tool-call loop works

Step 1 : User submits a query that would trigger a tool call

Send the user message and your tool definitions. The model responds with tool_calls — save the id from the first tool call (it may look like 5x450f94__thought__<signature>; pass it back unchanged on the next request).

curl -sS http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Tokyo right now?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": { "type": "string" }
            },
            "required": ["city"]
          }
        }
      }
    ]
  }' | tee /tmp/gemini_tool_step1.json | jq .

Copy the tool call id from the response:

TOOL_CALL_ID=$(jq -r '.choices[0].message.tool_calls[0].id' /tmp/gemini_tool_step1.json)
echo "$TOOL_CALL_ID"
# e.g. 5x450f94__thought__EvACCu0CAQw51sdR...

Step 2 : Run your tool, then send the result with the same tool_call_id

Run get_weather locally, then call the proxy again with the full message history. Set tool_call_id to the exact id from Step 1 — LiteLLM uses it as the Gemini functionResponse.id.

# Result from your local get_weather("Tokyo") call
WEATHER_RESULT='{"temp_c": 18, "condition": "clear"}'

curl -sS http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d "$(jq -n \
    --arg id "$TOOL_CALL_ID" \
    --arg content "$WEATHER_RESULT" \
    '{
      model: "gemini-3.5-flash",
      messages: [
        {role: "user", content: "What is the weather in Tokyo right now?"},
        {
          role: "assistant",
          content: null,
          tool_calls: [{
            id: $id,
            type: "function",
            function: {name: "get_weather", arguments: "{\"city\": \"Tokyo\"}"}
          }]
        },
        {role: "tool", tool_call_id: $id, content: $content}
      ],
      tools: [{
        type: "function",
        function: {
          name: "get_weather",
          description: "Get current weather for a city",
          parameters: {
            type: "object",
            properties: {city: {type: "string"}},
            required: ["city"]
          }
        }
      }]
    }')" | jq .

The id on the assistant tool_calls entry and the tool_call_id on the role: tool message must match. The function name must match the tool definition (get_weather).

Step 3 : Model produces the final answer

LiteLLM sends the matching id and name on the Gemini functionResponse part. The model then returns a normal assistant message with the weather summary.

3. Sampling parameters (`temperature`, `top_p`, `top_k`)

Google has advised moving away from temperature, top_p, and top_k for Gemini 3.5+ and steering sampling behavior through system instructions instead. These parameters still work today, but may be removed in a future API release.

LiteLLM follows the same guidance: when you pass temperature, top_p, or top_k on Gemini 3+ models, you will see a deprecation warning in the logs recommending system-instruction-based sampling instead.

Quick Start

SDK
PROXY

from litellm import completion

response = completion(
    model="gemini/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Summarize this article in 3 bullet points."}],
)

print(response.choices[0].message.content)

1. Setup config.yaml

model_list:
  - model_name: gemini-3.5-flash
    litellm_params:
      model: gemini/gemini-3.5-flash
      api_key: os.environ/GEMINI_API_KEY

  # Or use Vertex AI
  - model_name: vertex-gemini-3.5-flash
    litellm_params:
      model: vertex_ai/gemini-3.5-flash
      vertex_project: your-project-id
      vertex_location: us-central1

2. Start proxy

litellm --config /path/to/config.yaml

3. Make requests

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Supported Endpoints

LiteLLM provides full end-to-end support for Gemini 3.5 Flash on:

✅ /v1/chat/completions - OpenAI-compatible chat completions endpoint
✅ /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
✅ /v1/messages - Anthropic-compatible messages endpoint
✅ /v1/generateContent – Google Gemini API compatible endpoint

All endpoints support:

Streaming and non-streaming responses
Function calling with thought signatures
Multi-turn conversations
All Gemini 3-specific features (thinking levels, thought signatures)
Full multimodal support (text, image, audio, video)

Deploy this version​

What's New​

1. Minimal thinking level​

2. Strict function calling​

3. Sampling parameters (temperature, top_p, top_k)​

Quick Start​

Supported Endpoints​