Releases · modelscope/dash-infer

11 Feb 07:28

laiwenzh

v2.1.0

069c74e

v2.1.0 Latest

Latest

What's Changed

[JSON mode]: FormatEnforcer use cudaMallocHost for scores buffer by @WangNorthSea in #56
[A16W8 & A8W8]: further optimization for Ampere A16W8 fused gemm kernel 2. fix lora doc by @wyajieha in #58
[Multimodal]: Support LLM quantization with GPTQ and AXWY by @x574chen in #60
[PKG]: Reduce package size by only compiling flash-attn src with hdim128 by @laiwenzh in #62
[MOE]: add high performance moe kernel; fix a16w8 compile bug for sm<80 by @laiwenzh in #67

New Contributors

@wyajieha made their first contribution in #58

Full Changelog: v2.0.0...v2.1.0

Contributors

x574chen, WangNorthSea, and 2 other contributors

Assets 14

dashinfer-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

290 MB 2025-02-11T11:10:58Z
dashinfer-2.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

290 MB 2025-02-11T11:10:57Z
dashinfer-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

290 MB 2025-02-11T11:10:57Z
DashInfer-2.1.0.cpu.aarch64.tar.gz

11.3 MB 2025-02-11T11:10:58Z
DashInfer-2.1.0.cpu.x86_64.tar.gz

17.1 MB 2025-02-11T11:08:06Z
DashInfer-2.1.0.cuda-12.4-shared.x86_64.tar.gz

458 MB 2025-02-11T11:10:58Z
dashinfer_cpu-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

18.1 MB 2025-02-11T11:08:06Z
dashinfer_cpu-2.1.0-cp310-cp310-manylinux_2_28_aarch64.whl

12.9 MB 2025-02-11T11:10:58Z
dashinfer_cpu-2.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

18.1 MB 2025-02-11T11:08:06Z
dashinfer_cpu-2.1.0-cp38-cp38-manylinux_2_28_aarch64.whl

12.9 MB 2025-02-11T11:10:58Z
Source code (zip)

2025-02-11T07:26:49Z
Source code (tar.gz)

2025-02-11T07:26:49Z

21 Jan 07:31

github-actions

v2.0.0

012eb1b

v2.0.0

What's Changed

engine: stop and release model when engine release, and remove deprecated lock
sampling: generate_op heavily modified, remove dependency on global tensors
prefix cache: some bug fix, impove evict performance
json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p
span-attention: move span_attn decoderReshape to init
lora: add docs, fix typo
ubuntu: add ubuntu dockerfile, fix install dir err
bugifx: fix multi-batch rep_penlty bug

Full Changelog: v1.3.0...v2.0.0

Assets 14

20 Dec 13:10

github-actions

v2.0.0-rc3

163850f

v2.0.0-rc3

some bugfix

- uuid crash issue
- update lora implement
- set page size by param
- delete deprecated files

Assets 14

17 Dec 12:29

github-actions

v2.0.0-rc2

1b2a6ad

v2.0.0-rc2

release script: reduce python wheel size (#46)

Assets 14

27 Aug 03:33

yejunjin

v1.3.0

2e7ea7b

v1.3.0

Highlight

Support Baichuan-7B and Baichuan2-7B & 13B by @WangNorthSea in #38

Full Changelog: v1.2.1...v1.3.0

Contributors

WangNorthSea

Assets 12

01 Jul 03:28

yejunjin

v1.2.1

5ceddf9

v1.2.1

What's Changed

Add llama.cpp benchmark steps
fix: fallback to mha without avx512f support
solve security issue; helper: bugfix, cpu platform check
add release package workflow

Assets 13

24 Jun 05:32

yejunjin

v1.2.0

3a0417b

v1.2.0

expand context length to 32K & support flash attention on intel-avx512 platform

remove currently unsupported cache mode
examples: update qwen prompt template, add print func to examples
support glm-4-9b-chat by
change to size_t to avoid overflow when seq is long
update README since we support 32k context length
Add flash attention on intel-avx512 platform

Assets 13

29 May 08:32

laiwenzh

v1.1.0

1b9b010

v1.1.0

support Qwen2, change dashinfer model extensions

support Qwen2, add model_type Qwen_v20
change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
python example: remove xxx_quantize.json config file, use command line arg instead

Assets 13

14 May 05:50

laiwenzh

v1.0.4

9ef6e35

v1.0.4

First official release.

Assets 13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Highlight

Contributors

What's Changed

Releases: modelscope/dash-infer

v2.1.0

What's Changed

New Contributors

Contributors

v2.0.0

What's Changed

v2.0.0-rc3

v2.0.0-rc2

v1.3.0

Highlight

Contributors

v1.2.1

What's Changed

v1.2.0

v1.1.0

v1.0.4