Skip to main content

Nvidia Riva (Speech-to-Text)

LiteLLM supports NVIDIA Riva for speech-to-text via /audio/transcriptions. Works with both the NVCF-hosted Riva endpoint (e.g. Parakeet on build.nvidia.com) and self-hosted Riva deployments.

PropertyDetails
DescriptionRiva is NVIDIA's GPU-accelerated speech AI. LiteLLM streams the audio to Riva over gRPC and returns OpenAI-compatible transcripts.
Provider Route on LiteLLMnvidia_riva/
Provider DocRiva ASR docs ↗
TransportgRPC (not HTTP)
Supported OpenAI Endpoints/audio/transcriptions
Optional install

nvidia_riva requires the gRPC client and audio decoding libraries. Install them with:

pip install 'litellm[stt-nvidia-riva]'

This pulls in nvidia-riva-client, soundfile, audioread, and numpy. They are imported lazily so the rest of LiteLLM keeps working without them.

Quick Start​

from litellm import transcription
import os

os.environ["NVIDIA_RIVA_API_KEY"] = "nvapi-..." # your nvapi key

audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(
model="nvidia_riva/nvidia/parakeet-ctc-1_1b-asr",
file=audio_file,
api_base="grpc.nvcf.nvidia.com:443",
nvcf_function_id="1598d209-5e27-4d3c-8079-4751568b1081", # NVCF function id
)

print(response.text)

LiteLLM resamples the audio to 16 kHz mono LINEAR_PCM (Riva's required wire format) before streaming, so you can send mp3 / wav / flac / ogg directly. No need to preprocess.

Deployment modes​

Riva runs in two very different shapes. The presence of nvcf_function_id is the signal LiteLLM uses to default use_ssl, but you can always override it.

NVCF (NVIDIA-hosted)​

model_list:
- model_name: parakeet-asr
litellm_params:
model: nvidia_riva/nvidia/parakeet-ctc-1_1b-asr
api_base: grpc.nvcf.nvidia.com:443
api_key: os.environ/NVIDIA_RIVA_API_KEY # nvapi-...
nvcf_function_id: 1598d209-5e27-4d3c-8079-4751568b1081

When nvcf_function_id is set, LiteLLM:

  • enables TLS (use_ssl=True)
  • attaches the function-id gRPC metadata
  • attaches authorization: Bearer <api_key>

Self-hosted (no TLS)​

model_list:
- model_name: parakeet-asr
litellm_params:
model: nvidia_riva/nvidia/parakeet-ctc-1_1b-asr
api_base: localhost:50051

Self-hosted behind an ingress with TLS​

model_list:
- model_name: parakeet-asr
litellm_params:
model: nvidia_riva/nvidia/parakeet-ctc-1_1b-asr
api_base: riva.internal.company.com:443
use_ssl: true

LiteLLM Proxy Usage​

1. Add the model to your config​

model_list:
- model_name: parakeet-asr
litellm_params:
model: nvidia_riva/nvidia/parakeet-ctc-1_1b-asr
api_base: grpc.nvcf.nvidia.com:443
api_key: os.environ/NVIDIA_RIVA_API_KEY
nvcf_function_id: 1598d209-5e27-4d3c-8079-4751568b1081
model_info:
mode: audio_transcription

general_settings:
master_key: sk-1234

2. Start the proxy​

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Send a request​

curl --location 'http://0.0.0.0:4000/v1/audio/transcriptions' \
--header 'Authorization: Bearer sk-1234' \
--form 'file=@"/path/to/speech.mp3"' \
--form 'model="parakeet-asr"'

Supported parameters​

OpenAI parameters that map cleanly to Riva:

OpenAI paramBehavior
languageMapped to Riva language_code. Bare codes like en are normalized to en-US. BCP-47 codes like de-DE pass through.
response_formatjson (default) returns { "text": "..." }. verbose_json adds duration and words (timestamps in seconds).
timestamp_granularitiesPass ["word"] to enable word-level timestamps.

Riva-specific parameters you can set in litellm_params (or pass directly to transcription(...)):

ParamDefaultPurpose
nvcf_function_idunsetNVCF function id. When set, defaults use_ssl=True and attaches NVCF metadata.
use_sslTrue if nvcf_function_id is set, else FalseForce TLS on or off. Useful for self-hosted Riva behind a TLS ingress.
riva_model_name"" (auto-select)Override the internal Riva model name. Leaving it empty lets Riva pick based on language_code + sample_rate_hertz. Recommended unless you know exactly what you want.
enable_automatic_punctuationTrueStandard Riva flag.
endpointing_configunsetPass a dict that mirrors Riva's EndpointingConfig (start_threshold, stop_threshold, stop_history, stop_history_eou, ...).
chunking_strategyunsetOpenAI-style VAD config ({"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 700, "prefix_padding_ms": 250}). LiteLLM translates it to Riva's EndpointingConfig.

Why is riva_model_name empty by default?​

Internal Riva deployment names like parakeet-1.1b-en-US-asr-streaming-silero-vad-sortformer are NVIDIA's deployment identifiers. They change across NIM versions, regions, and self-hosted builds. Leaving model="" in RecognitionConfig lets Riva auto-select the right one based on language_code and sample_rate_hertz — which is what you almost always want. Only set riva_model_name if you have a specific deployed model you need to pin.

Audio formats​

LiteLLM decodes inbound audio with soundfile (wav / flac / ogg) and falls back to audioread for mp3 / m4a / mp4 / webm. Audio is then resampled to 16 kHz mono LINEAR_PCM before streaming to Riva.

If decoding fails (e.g. exotic codecs, DRM, or audioread not installed), LiteLLM raises a clear error asking you to convert upstream:

ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav

Environment variables​

VariablePurpose
NVIDIA_RIVA_API_KEYAPI key sent as authorization: Bearer .... NVCF expects nvapi-....
NVIDIA_RIVA_API_BASEDefault host:port for the gRPC endpoint. Same effect as setting api_base in litellm_params.
NVIDIA_NIM_API_KEYUsed as a fallback for NVIDIA_RIVA_API_KEY since most users reuse the same nvapi-... key across NVCF services.

Notes & limitations​

  • Transport is gRPC streaming. NVCF only supports streaming ASR today, so even short files are sent as a stream.
  • Diarization (diarization_config) and srt / vtt response formats aren't wired up yet — open an issue if you need them.
  • Cost calc: Riva doesn't return token usage. LiteLLM stores the audio duration on _hidden_params["audio_transcription_duration"] so cost can be derived externally.