Releases · microsoft/onnxruntime-genai

27 Mar 17:49

baijumeswani

v0.12.2

935d426

v0.12.2 Latest

Latest

Assets 14

onnxruntime-genai-0.12.2-linux-x64-cuda.tar.gz

sha256:baa0fbabef38f8f75e1464f6705435828b143214523883db991c824754b435d5

70.6 MB 2026-03-27T17:43:32Z
onnxruntime-genai-0.12.2-linux-x64.tar.gz

sha256:1b4489dc94794845ecd762fbd00429dac62139625aa01790c52f1cb48f09f3ed

45.6 MB 2026-03-27T17:43:35Z
onnxruntime-genai-0.12.2-osx-arm64.tar.gz

sha256:fcff5f01764f111dcae85114b0fd038af76a6b325849a4c8f387d14c48ee9239

3.2 MB 2026-03-27T17:43:35Z
onnxruntime-genai-0.12.2-win-arm64-dml-winml.zip

sha256:b1e9ae0f3dfcdd301a6b7e76b20f3d4cf074add76281a215154cbd12f6498368

15.3 MB 2026-03-27T17:43:44Z
onnxruntime-genai-0.12.2-win-arm64-dml.zip

sha256:f39668da469e3c846d3d33e2b7e98ae8386453e7a1a54e2629115057d5132ddf

15.2 MB 2026-03-27T17:43:42Z
onnxruntime-genai-0.12.2-win-arm64.zip

sha256:ed09a4b18c46d15f2459ad1c7fba1a9ac788afe1f3cd44f3215168d0d1ed96cd

14.4 MB 2026-03-27T17:43:39Z
onnxruntime-genai-0.12.2-win-x64-cuda-winml.zip

sha256:71d16968604e723c10e084731ca3b33aefdf914d7b451f1d8d2c684b35737be5

37.2 MB 2026-03-27T17:43:45Z
onnxruntime-genai-0.12.2-win-x64-cuda.zip

sha256:40a631a4b27413d0794ebe5f6befa2cd2ce2e51a40ccde68d35d956df913a358

36.3 MB 2026-03-27T17:43:41Z
onnxruntime-genai-0.12.2-win-x64-dml.zip

sha256:61560f77016f444a91c2ec9f5e3d441ff21f637eda6c883f5ac49b03dcfb0cd0

15.7 MB 2026-03-27T17:43:43Z
onnxruntime-genai-0.12.2-win-x64.zip

sha256:f71b354bf7521a0dfe10dd5b3dd60c1508ead8032a1f9ff56ad27093c6783210

14.8 MB 2026-03-27T17:43:40Z
Source code (zip)

2026-03-27T17:42:47Z
Source code (tar.gz)

2026-03-27T17:42:47Z

02 Mar 23:22

baijumeswani

v0.12.1

ab6e204

v0.12.1

#1988
#1984

Assets 14

13 Feb 17:38

baijumeswani

v0.12.0

f3a57ba

v0.12.0

What's Changed

Update versions after making 0.11.0 branch by @kunal-vaishnavi in #1867
Fix guidance usage in continuous decoding by @kunal-vaishnavi in #1870
Fix HelloPhi C# example by @kunal-vaishnavi in #1871
Fix regex by @apsonawane in #1875
Update extensions commit by @apsonawane in #1874
Revert removal of eps_without_if_support by @xiaofeihan1 in #1878
Fix condition for NPU by @apsonawane in #1880
Model builder refactoring by @tianleiwu in #1862
Add lintrunner to format code by @tianleiwu in #1884
Remove empty submodule leftover. by @xkszltl in #1883
Fix build for lack of RTLD_DI_ORIGIN support by @jaeyoonjung in #1888
Enable graph capture for webgpu by @qjia7 in #1848
Generic shared emb_tokens/lm_head implementation by @jixiongdeng in #1885
Fix bug in Squeeze for getting the value of total_seq_len by @Honry in #1886
Extra_options disable_qkv_fusion to untie qkv_projs from upstream choice by @jixiongdeng in #1893
Fix mac pipeline by @apsonawane in #1904
whisper: Support a variant of the whisper pipeline where encoder / decoder are stateful. by @RyanMetcalfeInt8 in #1857
Add model builder for Qwen2_5_VLTextModel by @tianleiwu in #1882
Integrate FARA-7B model by @apsonawane in #1902
Fix gpt-oss model export by @apsonawane in #1861
OpenVINO: Add support for model caching via 'cache_dir' provider option by @RyanMetcalfeInt8 in #1900
WinML - Remove the inclusive Microsoft.WindowsAppSDK.ML range check by @chrisdMSFT in #1907
Run the model in text mode by @apsonawane in #1908
Update extensions commit by @apsonawane in #1914
Fix gpt-oss export by @apsonawane in #1915
Support Olive new uint8 quantization format by @xiaoyu-work in #1916
Disable CUDA graph for Phi LongRoPE models with IF nodes on TRT-RTX by @anujj in #1921
Add support for CUDA and CPU arch for Qwen-2.5-VL and Fara-7B by @apsonawane in #1919
Add Gemma-3 vision tutorial to ONNX Runtime GenAI by @kunal-vaishnavi in #1793
Quark GPT-OSS support by @thpereir in #1903
Fix sliding window alignment regression in QNN models by @apsonawane in #1938
AMD RyzenAI EP Support by @akholodnamdcom in #1935
Update README by @natke in #1934
[RyzenAI] Non-pruned models backward compatibility by @akholodnamdcom in #1942
[VitisAI] EP loader by @akholodnamdcom in #1918
Set default top_k and top_p if it is None by @xiaoyu-work in #1944
Ensure dlls are signed in the c and nuget packages. by @baijumeswani in #1947
Bump torch from 2.7.1 to 2.7.1+cpu in /test/python/directml/torch by @dependabot[bot] in #1868
Add linker flags for 16 KB page size on Android by @sheetalarkadam in #1860
Only manually load DLLs if onnxruntime.dll is not already loaded. by @chemwolf6922 in #1800
Add a doc showing how to run GPT OSS 20B with WebGPU by @natke in #1945
Add C#, Java, and Objective-C APIs for Config by @kunal-vaishnavi in #1946
Fix GatherBlockQuantized node to support symmetric quantized LM_HEAD by @sushraja-msft in #1951
Fix QMoE blockwise quantization support for TRT-RTX execution provider by @anujj in #1926
Revert "Add a doc showing how to run GPT OSS 20B with WebGPU" by @kunal-vaishnavi in #1950
Add custom model path support for unit tests by @mpasumarthi-git in #1917
fix: patch llguidance to remove reference to ring crate by @sanaa-hamel-microsoft in #1948
Implement graph models for EPs by @qjia7 in #1895
Update handling EOS token id detection by @kunal-vaishnavi in #1925
Remove onnxruntime-genai-cuda from the foundry package by @baijumeswani in #1954
Include linux builds in the foundry ort-genai package by @baijumeswani in #1955
Support pre-registered plug-in NvTensorRtRtx execution provider library by @anujj in #1889
[RyzenAI] Linux compatibility fixes by @akholodnamdcom in #1959
Use cuda 12.8 to build ort-genai by @baijumeswani in #1960
Bump protobuf from 5.29.5 to 6.33.5 in /test/python by @dependabot[bot] in #1961
Add RAII wrappers for ORT Model Editor API types by @qjia7 in #1953
Rewrite all examples using standardization by @kunal-vaishnavi in #1939
Add versioning to the onnxruntime-genai-cuda.dll by @baijumeswani in #1965
[Build][Packaging] macOS packaging to skip building x86_64 by @baijumeswani in #1966
Sync packaging changes with ONNX Runtime by @baijumeswani in #1967
Release 0.12.0 cherry-pick PR by @baijumeswani in #1978

New Contributors

@xkszltl made their first contribution in #1883
@jaeyoonjung made their first contribution in #1888
@jixiongdeng made their first contribution in #1885
@Honry made their first contribution in #1886
@thpereir made their first contribution in #1903
@akholodnamdcom made their first contribution in #1935
@sheetalarkadam made their first contribution in #1860
@sanaa-hamel-microsoft made their first contribution in #1948

Full Changelog: v0.11.4...v0.12.0

Contributors

anujj, Honry, and 21 other contributors

Assets 14

12 Dec 05:23

kunal-vaishnavi

v0.11.4

a8a6136

v0.11.4

What's Changed

WinML - Remove the inclusive Microsoft.WindowsAppSDK.ML range check by @chrisdMSFT in #1907
Run the model in text mode by @apsonawane in #1908

Full Changelog: v0.11.3...v0.11.4

Contributors

chrisdMSFT and apsonawane

Assets 13

08 Dec 20:23

kunal-vaishnavi

v0.11.3

4915b02

v0.11.3

What's Changed

Model builder refactoring by @tianleiwu in #1862
Add lintrunner to format code by @tianleiwu in #1884
Remove empty submodule leftover. by @xkszltl in #1883
Fix build for lack of RTLD_DI_ORIGIN support by @jaeyoonjung in #1888
Enable graph capture for webgpu by @qjia7 in #1848
Generic shared emb_tokens/lm_head implementation by @jixiongdeng in #1885
Fix bug in Squeeze for getting the value of total_seq_len by @Honry in #1886
Extra_options disable_qkv_fusion to untie qkv_projs from upstream choice by @jixiongdeng in #1893
Fix mac pipeline by @apsonawane in #1904
whisper: Support a variant of the whisper pipeline where encoder / decoder are stateful. by @RyanMetcalfeInt8 in #1857
Add model builder for Qwen2_5_VLTextModel by @tianleiwu in #1882
Integrate FARA-7B model by @apsonawane in #1902
Set version as 0.11.3 by @kunal-vaishnavi in #1905

New Contributors

@xkszltl made their first contribution in #1883
@jaeyoonjung made their first contribution in #1888
@jixiongdeng made their first contribution in #1885
@Honry made their first contribution in #1886

Full Changelog: v0.11.2...v0.11.3

Contributors

Honry, qjia7, and 7 other contributors

Assets 13

18 Nov 12:53

kunal-vaishnavi

v0.11.2

25962b0

v0.11.2

What's Changed

Revert removal of eps_without_if_support by @xiaofeihan1 in #1878
Fix condition for NPU by @apsonawane in #1880
Set version as 0.11.2 by @kunal-vaishnavi in #1881

Full Changelog: v0.11.1...v0.11.2

Contributors

xiaofeihan1, apsonawane, and kunal-vaishnavi

Assets 13

17 Nov 03:39

kunal-vaishnavi

v0.11.1

ec0f733

v0.11.1

What's Changed

Cherry pick guidance fix into 0.11.1 release by @kunal-vaishnavi in #1872
Set version as 0.11.1 by @kunal-vaishnavi in #1873
Fix regex by @apsonawane in #1876

Full Changelog: v0.11.0...v0.11.1

Contributors

apsonawane and kunal-vaishnavi

Assets 13

14 Nov 02:51

kunal-vaishnavi

v0.11.0

e0e02a9

v0.11.0

What's Changed

ADO - Update WinML build pipeline by @chrisdMSFT in #1768
Fix CMakeLists.txt auto-detection of library directory by @anujj in #1774
Fix new/delete override and Enable cuda kernel test in Windows by @tianleiwu in #1772
Use abbreviation for TensorRT RTX EP by @kunal-vaishnavi in #1763
Add trust remote code option to model builder by @kunal-vaishnavi in #1766
Support block-wise quant in qmoe op by @apsonawane in #1746
Change the status for TRT-RTX EP by @gaugarg-nv in #1780
Cherry-Pick changes from rel 0.10.0 back to main. by @chrisdMSFT in #1782
Fix /CETCOMPAT Usage for Cross-Compiling by @sayanshaw24 in #1779
Provide distributed version of improved TopK kernel by @hariharans29 in #1710
[TRT-RTX] Disable KV cache re-computation for Phi models by @gaugarg-nv in #1787
[CUDA] Add high-performance Top-K kernels and online benchmarking by @tianleiwu in #1748
Change shared indices array type from float to int by @hariharans29 in #1789
Enable bfloat16 multi-modal models by @kunal-vaishnavi in #1786
Disable lmhead while prompt processing by @qti-ashimaj in #1762
Introduce support for dynamic batching by @baijumeswani in #1662
Generate pyd type info by @chemwolf6922 in #1742
Add trt-rtx c packages in c example by @anujj in #1794
[CUDA] Fix build with CUDA >= 12.9 by @tianleiwu in #1802
[CUDA] topk kernels v2 by @tianleiwu in #1798
Add prefill Chunking Support for NvTensorRtRtx and Cuda Providers by @anujj in #1765
Add TRT-RTX EP support, keep NvTensorRtRtx as user facing name, and force QDQ by @anujj in #1791
[CUDA] Add static assert to suppress windows build warnings by @tianleiwu in #1804
Revert "Generate pyd type info" by @baijumeswani in #1805
[QNN] Support continuous decoding by @baijumeswani in #1808
ADO Pipeline - nuget_winml_package_reference_version is configured at build time. by @chrisdMSFT in #1811
Update version to 0.11.0-dev by @baijumeswani in #1815
Add Support For Tokenizer Options by @sayanshaw24 in #1785
Fix exit call in README example by @justinchuby in #1823
Add tokenizer APIs for accessing important ids by @kunal-vaishnavi in #1822
Use correct classes for config-only usage in model builder by @kunal-vaishnavi in #1828
Fix packaging pipeline by @baijumeswani in #1829
Add missing tokenizer methods in java by @baijumeswani in #1833
Add run options to ONNX Runtime GenAI by @kunal-vaishnavi in #1795
Avoid Processing EOS Token During Continuous Decoding by @baijumeswani in #1814
Fix nuget packaging pipeline for dev builds by @baijumeswani in #1837
Add tool normalization for tool calling by @kunal-vaishnavi in #1838
Refactor past_present_share_buffer logic into reusable function by @anujj in #1839
Fix nuget packaging pipeline by @baijumeswani in #1841
Add enable_webgpu_graph in extra_options by @qjia7 in #1788
Update tool normalization in ORT GenAI by @kunal-vaishnavi in #1842
Support RotaryEmbedding in GQA for webgpu ep by @xiaofeihan1 in #1847
Enable guidance ff tokens for faster inference by @JC1DA in #1803
Support pre-registered plug-in cuda execution provider library by @baijumeswani in #1850
ADO: Update pipeline to publish onnxruntime-genai. for relwithdebinfo builds. by @chrisdMSFT in #1855
Layer-wise KV Cache Allocation for Models with Alternating Attention Patterns by @anujj in #1832
Mpasumarthi/nvtrt test suite by @mpasumarthi-git in #1756
bugfix: fix a memory issue in Whisper by @fs-eire in #1859
Add disable cuda graph when num_beams > 1 and fix set_provider_option bug by @anujj in #1846
Mixed precision export support for gptq quantized model by @rM-planet in #1853
Enable If Node Support for TRT-RTX in Phi-3.5/Phi-4 LongRoPE Models by @anujj in #1851
Fix handling EOS token id detection by @kunal-vaishnavi in #1849
Ensure Consistent Tool Calling JSON Serialization and Deserialization by @sayanshaw24 in #1863
Add C# binding for GetNextTokens by @kunal-vaishnavi in #1865
Set version as 0.11.0 by @kunal-vaishnavi in #1866

New Contributors

@hariharans29 made their first contribution in #1710
@qti-ashimaj made their first contribution in #1762
@chemwolf6922 made their first contribution in #1742
@qjia7 made their first contribution in #1788
@xiaofeihan1 made their first contribution in #1847
@JC1DA made their first contribution in #1803
@mpasumarthi-git made their first contribution in #1756
@rM-planet made their first contribution in #1853

Full Changelog: v0.10.0...v0.11.0

Contributors

anujj, JC1DA, and 16 other contributors

Assets 13

10 Oct 17:26

baijumeswani

v0.10.0

6deb570

v0.10.0

What's Changed

Enable continuous decoding for NvTensorRtRtx EP by @anujj in #1697
Use updated Decoder API with skip_special_tokens by @sayanshaw24 in #1722
Update extensions to include memleak fix by @baijumeswani in #1724
Support batch processing for whisper example by @jiafatom in #1723
Update onnxruntime_extensions dependency version by @baijumeswani in #1725
Include C++ header in native nuget and fix compiler warnings by @baijumeswani in #1727
Update Microsoft.Extensions.AI to 9.8.0 by @rogerbarreto in #1689
Update Extensions commit for Qwen 2.5 Chat Template Tools Fix by @sayanshaw24 in #1730
Whisper Truncation Extensions Commit Update by @sayanshaw24 in #1735
Enable Cuda Graph for TensorRtRtx by default by @anujj in #1734
Update sampling benchmark by @tianleiwu in #1729
Add Windows WinML x64 build workflow by @chrisdMSFT in #1740
Fix CUDA synchronization issue between ORT-GenAI and TRT-RTX inference by @anujj in #1733
Hello WindowsML by @chrisdMSFT in #1711
[CUDA] sampling kernel improvements by @tianleiwu in #1732
Update GitHub Actions to latest versions by @snnn in #1749
Update WinML version to 1.8.2091 by @nieubank in #1750
Address macos packaging pipeline issues by @baijumeswani in #1747
ProviderOptions level device filtering and APIs to configure model level device filtering by @vortex-captain in #1744
Fix string indexing bug with Phi-4 mm tokenization by @kunal-vaishnavi in #1751
Fix TRT-RTX EP regression by @gaugarg-nv in #1754
Fix typo in C API header by @kunal-vaishnavi in #1753
Enable WinML by default in ADO pipelines by @chrisdMSFT in #1755
Change default build configuration to 'relwithdebinfo' by @baijumeswani in #1757
Pin cmake and vcpkg versions in macOS workflows by @snnn in #1760
Add TRT_RTX support for onnxruntime-genai-trt-rtx wheel by @anujj in #1736
rel-0.10.0 by @chrisdMSFT in #1767
Microsoft.ML.OnnxRuntimeGenAI.WinML.props by @chrisdMSFT in #1776
Warning fix - ort_genai.h by @chrisdMSFT in #1778
Microsoft.ML.OnnxRuntimeGenAI.targets by @chrisdMSFT in #1781

New Contributors

@nieubank made their first contribution in #1750

Full Changelog: v0.9.2...v0.10.0

Contributors

snnn, anujj, and 10 other contributors

Assets 13

16 Sep 07:27

kunal-vaishnavi

v0.9.2

fc6f8d7

v0.9.2

This release fixes a pre-processing bug with Phi-4 multimodal.

Full Changelog: v0.9.1...v0.9.2

Assets 11

Releases: microsoft/onnxruntime-genai

v0.12.2

Uh oh!

v0.12.1

Uh oh!

v0.12.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.11.4

What's Changed

Contributors

Uh oh!

v0.11.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.11.2

What's Changed

Contributors

Uh oh!

v0.11.1

What's Changed

Contributors

Uh oh!

v0.11.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.10.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.9.2

Uh oh!