What is Whisper?

Whisper is OpenAI's open-source automatic speech recognition (ASR) model, released in 2022. It's trained on 680,000 hours of multilingual audio and is widely considered the most accurate general-purpose transcription model available, especially for accented speech, technical terminology, and noisy environments.

Whisper comes in multiple sizes: tiny, base, small, medium, large, and large-v3-turbo. Larger models are more accurate but slower. Large-v3-turbo hits the best accuracy-to-speed balance for real-time applications.

OpenAI Whisper API vs Groq Whisper

OpenAI hosts Whisper as part of its API. Groq also hosts Whisper — specifically large-v3-turbo — but on custom LPU (Language Processing Unit) hardware that Groq designed specifically for inference workloads. The result is dramatically faster inference at the same accuracy level.

DimensionOpenAI Whisper APIGroq Whisper (large-v3-turbo)
Typical latency500–1500ms~150–300ms
Modelwhisper-1 (large-v2)whisper-large-v3-turbo
AccuracyExcellentExcellent (slightly better model)
HardwareStandard GPU clustersCustom LPU chips
StreamingNoNo
Language support99 languages99 languages
Pricing (API)$0.006/min~$0.004/min

Why 200ms vs 1000ms matters for dictation

At 1000ms, you finish speaking, wait a full second, then see the text. That pause breaks the natural mental rhythm — you've already moved on to the next thought. Users often describe it as "just slow enough to be annoying."

At 200ms, the turnaround is imperceptible in normal use. You release the microphone and the text is already there. The experience feels like thought → text with no detectable gap.

This is why AiType specifically chose Groq over the OpenAI Whisper API. The ~250ms total pipeline time (transcription + AI cleanup) is achievable only because the transcription leg runs in ~150–200ms on Groq's LPU hardware.

What about local / on-device Whisper?

It's possible to run Whisper locally on your machine using tools like Whisper.cpp (optimised for Apple Silicon). On an M3 Pro with the large-v3 model, you can get ~200–400ms latency — competitive with Groq. But running a large ASR model locally requires significant CPU/GPU resources, drains battery, and produces no AI cleanup pass. For most users, the cloud approach is simpler and produces better results.

Why AiType uses Groq

The bottom line

Whisper is the model; Groq is the hardware that makes it fast enough for real-time dictation. AiType uses Groq Whisper large-v3-turbo to deliver ~250ms speech-to-clean-text, which is the fastest cloud-based dictation pipeline with AI cleanup available today.

Experience the ~250ms pipeline

14-day free trial. Mac, Windows, iPhone, Android.