L40S is now available on WriftAI
Feb 14, 2024
·1 minute read
L40S is now available on WriftAI.
I want to explain where it fits and why, because the case for it is specific.
Many image generation, audio transcription, speech synthesis, and document understanding models running on WriftAI do not need the largest GPUs available. Small enough to iterate on quickly, cheap enough to run at volume, and good enough for most production tasks. They tend to land in the size range L40S is built for.
The L40S is NVIDIA's Ada Lovelace data center GPU. 48GB of GDDR6, 864 GB/s of memory bandwidth, 4th generation Tensor Cores with FP8 support. A 13B model in float16 needs around 26GB. A 34B model in int8 needs around 34GB. Both fit on a single L40S with room left for the KV cache. The FP8 support means quantized inference in this size range runs faster and at better cost per run than older architectures. For models below that range, T4 still makes sense. For models in the 13B to 34B range, L40S is the stronger choice on both performance and price.
For 70B models and larger, the L40S is not the right fit. Models at that scale need more memory bandwidth than GDDR6 can provide.
Pricing is at wrift.ai/pricing.