- Update examples after 0.12.0 release
- Add missing Quark 0.11 weight patterns for ChatGLM3 output layer
- Support Qwen2.5-VL pre-quantized models in qwen.py
- Fix incorrect batch responses when using multiple prompts
- Harden CUDA error checking across the codebase
- allow pruned models for prefill
- Add small changes after pruning prefill