Mistral Medium 3.5 is a Large Language Models (LLMs) tool. Open-weight 128B model for coding, reasoning, and instruction-following with vision. Key features include Unified Model Architecture, Large Context Window, and Multimodal Vision Input. Best for software developers and engineers, data scientists and analysts and scientists and researchers.
About Mistral Medium 3.5
Mistral Medium 3.5 is a dense 128-billion-parameter open-weight LLM from Mistral AI combining instruction-following, reasoning, and coding in a single unified model. The model handles 256K-token context windows, multimodal vision input, and configurable reasoning effort — targeting developers building applications that need strong coding performance with the option to self-host.
The core features that matter
- Unified model architecture combining instruction-following, reasoning, and coding in one 128B dense model rather than three separate specialists, with per-request reasoning effort configuration
- Large context window at 256K tokens, supporting full codebases, long documents, and complex multi-turn conversations without context loss
- Multimodal vision input with a from-scratch vision encoder handling variable image sizes and aspect ratios for document analysis and visual question answering
- Strong coding performance at 77.6% on SWE-Bench Verified and 91.4% on τ³-Telecom agentic benchmarks, suitable for complex coding tasks and agent workflows
- Open weights for self-hosting under a modified MIT license with weights on Hugging Face, runnable on 4 GPUs at Q4 quantization
- Remote cloud agents through Mistral Vibe CLI with asynchronous execution, parallel coding sessions, and GitHub PR integration
How it stands out
The frontier LLM space has closed-source leaders (GPT, Claude, Gemini) and open-weight alternatives (Llama, DeepSeek, Qwen). Mistral Medium 3.5's specific position is the open-weight availability combined with strong coding benchmarks. For developers wanting self-hosted AI without the closed-source pricing or data flow concerns, open-weight models like Mistral Medium 3.5 provide a genuine alternative to API-only competitors.
The honest qualifier: open-weight models trail frontier closed-source models on some benchmarks while exceeding them on others. Mistral Medium 3.5 is strong on coding specifically but may not match Claude or GPT on all tasks. Self-hosting requires meaningful infrastructure investment (4 GPUs is the minimum, more for production loads), so the cost calculation versus API access depends on usage volume. For organizations with significant AI usage where API costs add up or data residency matters, Mistral Medium 3.5 enables self-hosted deployment. For occasional users, hosted API access remains more practical.
Key Features
Unified Model Architecture.
Large Context Window.
Multimodal Vision Input.
Strong Coding Performance.
Open Weights for Self-Hosting.
Remote Cloud Agents.
Frequently Asked Questions
Mistral Medium 3.5 is a 128 billion parameter large language model released by Mistral AI in April 2026. It's a dense transformer model with a 256K context window that combines instruction-following, reasoning, and coding in one unified architecture. The model is released as open weights under a modified MIT license and can be self-hosted or accessed via API.
Mistral Medium 3.5 costs $1.50 per million input tokens and $7.50 per million output tokens through the Mistral API. This is half the input cost of Claude Sonnet 4.6 and 40% cheaper than GPT-4o. For self-hosting, the open weights are free to download under a modified MIT license, though high-revenue enterprises may need a commercial agreement.
Yes, Mistral Medium 3.5 includes multimodal vision capabilities. It has a vision encoder trained from scratch that accepts both text and image inputs with text output. The encoder handles variable image sizes and aspect ratios, making it suitable for document parsing, diagram understanding, screenshot analysis, and visual question answering tasks.
Mistral Medium 3.5 scores 77.6% on SWE-Bench Verified, which is close to Claude Sonnet 4.6 at 79.6% but ahead of smaller models like Qwen 3.6 at 72.4%. It's designed as a unified model that handles coding, reasoning, and general tasks together, with configurable reasoning effort. The model also powers Mistral Vibe CLI for autonomous coding agents that can open pull requests.




