Skip to content

Commit 91f051e

Browse files
Merge remote-tracking branch 'refs/remotes/origin/main'
2 parents f476655 + dd07b0d commit 91f051e

3 files changed

Lines changed: 532 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Changelog
2+
3+
All notable changes to QuantLLM are recorded here. The format follows
4+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project
5+
adheres to [Semantic Versioning](https://semver.org/).
6+
7+
## [Unreleased] — production hardening on top of v2.1.0rc1
8+
9+
### Fixed
10+
11+
- **`is_quantized` no longer lies about the loaded model state.** The
12+
attribute is now a derived property reading
13+
`model.config.quantization_config` (and BitsAndBytes layer types) at
14+
call time. This fixes three concrete bugs in v2.1.0rc1:
15+
* `from_config_only=True` previously left `_is_quantized=True` even
16+
though `AutoModelForCausalLM.from_config(...)` returns a random-
17+
weights model with no quantization. The flag is now `False` and a
18+
warning is emitted to make the random-weights nature explicit.
19+
* A missing `bitsandbytes` install used to silently fall through to
20+
full precision while keeping `_is_quantized=True`. We now log a
21+
descriptive warning and report `False`.
22+
* Pre-quantized HF repos that already shipped a `quantization_config`
23+
(GPTQ, AWQ, etc.) are now correctly reported as quantized regardless
24+
of the user's `quantize=False` flag.
25+
- **`DEFAULT_ARCHITECTURE_FALLBACKS` is now actually consulted.** The
26+
fallback table introduced by PR #27 was dead code whenever HF returned
27+
a non-empty `model_type` (i.e. always). `resolve_model_type` now
28+
checks the table directly and recognises common version-suffix
29+
patterns (`qwen3``qwen2`, `llama4``llama`, `phi4``phi3`,
30+
`gemma3``gemma2`, etc.).
31+
- **`register_architecture` class lookup now uses the natural API.**
32+
Calling `register_architecture("newmodel", base_model_type="llama",
33+
model_class=NewModel)` previously stored the class under `"newmodel"`
34+
but looked it up under `"llama"`, so the fallback path silently
35+
ignored it. The lookup now tries the original `config.model_type`
36+
first and falls back to the resolved base family.
37+
- Removed an accidentally duplicated `if is_bnb and is_8bit ...` block
38+
in the existing-quant detection branch of
39+
`TurboModel.from_pretrained`.
40+
41+
### Added
42+
43+
- **`TurboModel.is_quantized` public property** plus
44+
**`TurboModel.report()`** returning a structured dict (`model_id`,
45+
`params_billion`, `requested_bits`, `effective_loading_bits`,
46+
`is_quantized`, `quant_method`, `device`, `dtype`, `finetuned`,
47+
`lora_applied`). Use `report()` to assert programmatically what the
48+
loader actually produced.
49+
- **Pre-quantized repo detection.** Repository names matching
50+
`*-bnb-4bit`, `*-bnb-8bit`, `*-AWQ`, `*-GPTQ`, `*-INT4`, `*-INT8`,
51+
`*-FP8`, `*-EETQ`, `*-HQQ`, `*-AQLM` log a friendly hint that the
52+
embedded `quantization_config` will be honoured rather than
53+
re-quantized.
54+
- **GGUF-only repo hint.** When a name contains `-gguf` / `.gguf`,
55+
`from_pretrained` warns and points the user at `from_gguf`.
56+
- **Expanded `DEFAULT_ARCHITECTURE_FALLBACKS` table** covering Llama 2/3/4,
57+
Mistral / Mixtral, Qwen 2 / 2-MoE / 3, Phi / Phi-3 / Phi-4, Gemma /
58+
Gemma 2 / Gemma 3, Falcon, Cohere / Command-R, DeepSeek (V2/V3),
59+
OLMo / OLMo 2, SmolLM / SmolLM 2 / SmolLM 3, Yi, StarCoder /
60+
StarCoder 2, InternLM / InternLM 2, Baichuan, ChatGLM and StableLM.
61+
- **Real CI workflow** at `.github/workflows/ci.yml` running ruff,
62+
pytest on Python 3.10 / 3.11 / 3.12, and `python -m build` +
63+
`twine check` on every PR.
64+
- **`pyproject.toml`** providing PEP 517 / 518 build metadata, a
65+
conservative ruff lint profile and pytest defaults.
66+
- **`.pre-commit-config.yaml`** for local enforcement (whitespace,
67+
end-of-file fixer, large-file guard, ruff with autofix).
68+
- **`docs/guide/consumer-hardware.md`** documenting expected behaviour
69+
on every tier of consumer hardware (CPU-only, ≤ 8 GB VRAM,
70+
12 – 24 GB, Apple Silicon, multi-GPU) and how to inspect the loaded
71+
state.
72+
- **Regression tests** for every fix above:
73+
* `tests/test_quantization_state.py` — runtime quantization state
74+
tracking, `from_config_only` honesty, `report()` schema.
75+
* `tests/test_resolve_model_type.py` — fallback table consultation,
76+
family-suffix matching, registry-class lookup ergonomics.
77+
78+
### Changed
79+
80+
- `TurboModel.__repr__` now reads from the new `is_quantized` property
81+
and degrades gracefully when `num_parameters()` is unavailable
82+
(mocked / lazily-loaded models).
83+
- `TurboModel.from_gguf` now sets `_is_quantized_override = True`
84+
rather than mutating an attribute the type system thought was a
85+
property -- this is functionally identical but more honest about the
86+
contract.
87+
- The "bitsandbytes not installed" warning now explains how to install
88+
it and explicitly states that loading falls back to full precision.
89+
90+
## [2.0.0] — 2025-12-21
91+
92+
Initial public release of the `turbo()` API and the GGUF / ONNX / MLX
93+
export pipeline. See the GitHub
94+
[releases page](https://github.com/codewithdark-git/QuantLLM/releases/tag/v2.0.0)
95+
for the full notes.

docs/images/animation.gif

127 KB
Loading

0 commit comments

Comments
 (0)