Lightning Whisper MLX

An incredibly fast implementation of Whisper optimized for Apple Silicon.

Mustafa Aljadery GitHub Link

Results

10x faster than Whisper CPP, 4x faster than current MLX Whisper implementation.

Features

Batched Decoding -> Higher Throughput
Distilled Models -> Faster Decoding (less layers)
Quantized Models -> Faster Memory Movement
Coming Soon: Speculative Decoding -> Faster Decoding with Assistant Model

Installation

Install lightning whisper mlx using pip:

pip install lightning-whisper-mlx

Usage

Models:

["tiny", "small", "distil-small.en", "base", "medium",
          distil-medium.en", "large", "large-v2", "distil-large-v2", "large-v3",
          "distil-large-v3"]

Quantization:

[None, "4bit", "8bit"]

Example


        from lightning_whisper_mlx import LightningWhisperMLX 


        whisper = LightningWhisperMLX(model="base", batch_size=12, quant=None)
        


        text = whisper.transcribe(audio_path="/audio.mp3")['text'] 


        print(text)

Credits

Mustafa - Creator of Lightning Whisper MLX
Awni - Implementation of Whisper MLX (I built on top of this)
Vaibhav - Inspired me to build this (He created a version optimized for Cuda)