Quantizing Qwen2.5-1.5B to INT4 with Microsoft Olive and Running It Locally on iOS
How I used Microsoft Olive's SelectiveMixedPrecision and ModelBuilder passes to quantize Qwen2.5-1.5B-Instruct to INT4 — with strategic INT8 overrides — and deployed it on iOS using ONNX Runtime GenAI and Flutter.