jetson-arena

LLM inference benchmarks on NVIDIA Jetson AGX Orin 64GB.

Device

Spec Value
Device NVIDIA Jetson AGX Orin 64GB
Memory 61.37 GiB (shared CPU+GPU)
Platform JetPack L4T r36.4, tegra, aarch64

Results: vLLM 0.17.1

Docker image: narandill/vllm:0.17.0-r36.4.tegra-aarch64-cp312-cu129-24.04 Config: FP16, eager mode (no CUDAGraph), FlashAttention v2

Model Quant TTFT avg (ms) TTFT min (ms) Decode (tok/s) max_model_len
Qwen3.5-2B none 232.9 231.8 12.3 4096
Qwen3.5-9B none 310.4 304.6 8.5 2048
Qwen2.5-32B-Instruct-AWQ AWQ 4-bit 383.0 298.1 4.9 4096

Notes

Raw Data

See results/ for structured JSON benchmark data.