Recommended Models

Home
/ Recommended Models

Benchmarked open-weight models tested on Infersec infrastructure. Quality metrics, throughput, and latency data from real hardware runs.

Qwen3-Coder-Next-GGUF

unsloth

80B

UD-IQ2_XXSGGUF21.7 GB

No description available.

★ Recommended for coding, agentic

deepreinforce-ai_Ornith-1.0-35B-GGUF

bartowski

35B

Q4_K_MGGUF19.9 GB

## Llamacpp imatrix Quantizations of Ornith-1.0-35B by deepreinforce-ai Using <a href="https://github.com/ggml-org/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggml-org/llama.cpp/releases/tag/b9781">b9781</a> for quantization. Original model: https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B

Qwopus3.6-35B-A3B-v1-GGUF

Jackrong

35B

Q4_K_MGGUF19.7 GB

# 🌟 Qwopus3.6-35B-A3B-v1 ## 💡 Base Model Overview **Qwen3.6-35B-A3B** is an advanced hybrid sparse MoE (Mixture-of-Experts) model developed by Alibaba Cloud. It features 35B total parameters with only 3B active parameters per token, ensuring high inference efficiency. Architecturally, it combines Gated DeltaNet linear attention with standard gated attention layers, routing tokens across **256 experts**. It natively supports a massive **262k context window** and is specifically designed for high-performance agentic coding, deep reasoning, and multimodal tasks.

★ Recommended for coding, agentic

Qwable-3.6-35b

Mia-AiLab

35B

Q4_K_MGGUF19.7 GB

<p align="center"> <img src="assets/qwable-35b.png" alt="Qwable 35b" width="720"> </p> **Qwable 3.6 35b** is a full Hugging Face checkpoint fine-tuned from `unsloth/Qwen3.6-35b` on a cleaned Fable 5-style reasoning and instruction dataset. The goal of this model is simple: take a strong Qwen 35b base and push it toward more deliberate, structured, trace-like assistant behavior, especially for code, technical reasoning, and instruction-following workflows.

★ Recommended for coding, agentic

Qwen-AgentWorld-35B-A3B-GGUF

unsloth

35B

UD-Q4_K_MGGUF20.6 GB

<div> <p style="margin-top: 0;margin-bottom: 0;"> <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em> </p> <div style="display: flex; gap: 5px; align-items: center; "> <a href="https://github.com/unslothai/unsloth/"> <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133"> </a> <a href="https://discord.gg/unsloth"> <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173"> </a> <a href="https://docs.unsloth.ai/"> <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143"> </a> </div> </div> # Qwen-AgentWorld-35B-A3B <div style="text-align: center"> <img width="400px" src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen-AgentWorld/logo.png"> <p> <a href="http://arxiv.org/abs/2606.24597">📑 Technical Report</a> | <a href="https://qwen.ai/blog?id=qwen-agentworld">📖 Blog</a> | <a href="https://huggingface.co/collections/Qwen/qwen-agentworld">🤗 Hugging Face</a> | <a href="https://modelscope.cn/collections/Qwen/Qwen-AgentWorld">🤖 ModelScope</a> | <a href="https://github.com/QwenLM/Qwen-AgentWorld">💻 GitHub</a> | <a href="https://qwen.ai/blog?id=qwen-agentworld#interactive-demo-interactive-demo">🖥️ Demo</a> </p> </div>

GLM-4.7-Flash-GGUF

unsloth

30B

Q4_K_MGGUF17.1 GB

# Read our How to [Run GLM-4.7-Flash Guide!](https://unsloth.ai/docs/models/glm-4.7-flash) ## Jan 21 update: llama.cpp fixed a bug that caused looping and poor outputs. We updated the GGUFs - please re-download the model for much better outputs. - **Repeat penalty: Disable it, or set `--repeat-penalty 1.0`** You can now use Z.ai's recommended parameters and get great results: - For general use-case: `--temp 1.0 --top-p 0.95` - For tool-calling: `--temp 0.7 --top-p 1.0` - If using llama.cpp, set `--min-p 0.01` as llama.cpp's default is 0.05 You can also fine-tune GLM-4.7-Flash with Unsloth via our [GLM free notebook](https://unsloth.ai/docs/models/glm-4.7-flash#fine-tuning-glm-4.7-flash).

★ Recommended for coding, agentic

Qwen3-Coder-30B-A3B-Instruct-GGUF

unsloth

30B

Q5_K_MGGUF20.2 GB

No description available.

★ Recommended for coding, agentic

Qwen3.6-28B-REAP20-A3B-GGUF

barozp

28B

Q6_KGGUF21.6 GB

A 20% expert-pruned variant of Qwen3.6-35B-A3B using the REAP method. 28B total parameters with 3B active, providing strong performance at reduced compute cost.

gemma-4-26B-A4B-it-GGUF

unsloth

26B

UD-Q6_KGGUF43.3 GB

No description available.

★ Recommended for coding, agentic

gpt-oss-20b-GGUF

unsloth

20B

Q6_KGGUF22.4 GB

GGUF quantization of OpenAI's gpt-oss-20b, provided by Unsloth. A 20B parameter text generation model compatible with llama.cpp.

gemma-4-12b-it-GGUF

unsloth

12B

Q4_K_MGGUF6.6 GB

# Read our How to [Run Gemma 4 12B Guide!](https://docs.unsloth.ai/models/gemma-4) <div> <p style="margin: 0 0 0px 0; margin-top: 0px;"> <em>See <a href="https://unsloth.ai/docs/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em> </p> <div style="display: flex; gap: 5px; align-items: center; margin-bottom: 0px;"> <a href="https://github.com/unslothai/unsloth/"> <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133"> </a> <a href="https://discord.gg/unsloth"> <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173"> </a> <a href="https://unsloth.ai/docs/models/gemma-4"> <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143"> </a> </div> <ul style="margin: 0;"> <li>Gemma 4 12B can now be run and fine-tuned in <a href="https://unsloth.ai/docs/new/studio">Unsloth Studio</a>. <a href="https://unsloth.ai/docs/models/gemma-4">Read our guide</a>.</li> <li>See all versions of Gemma 4 (GGUF, 16-bit etc.) <a href="https://huggingface.co/collections/unsloth/gemma-4">in our collection</a>.</li> <li>Example of Gemma 4 E4B (4-bit GGUF) running in Unsloth Studio with tool-calling:</li> </ul> </div> <img width="600" alt="gemma 4 in unsloth studio" src="https://cdn-uploads.huggingface.co/production/uploads/62ecdc18b72a69615d6bd857/BTmv3pr-QQ8ZMxxX3Ofnf.gif" /> ---

★ Recommended for coding, agentic

deepreinforce-ai_Ornith-1.0-9B-GGUF

bartowski

Q6_KGGUF14.8 GB

## Llamacpp imatrix Quantizations of Ornith-1.0-9B by deepreinforce-ai Using <a href="https://github.com/ggml-org/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggml-org/llama.cpp/releases/tag/b9781">b9781</a> for quantization. Original model: https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B

LFM2.5-8B-A1B-GGUF

LiquidAI

Q8_0GGUF8.4 GB

LFM2.5-8B-A1B is a multilingual text generation model from Liquid AI, available in GGUF format for llama.cpp. Supports English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Qwen3-8B-GGUF

Qwen

Q4_K_MGGUF4.7 GB

No description available.