Models

This page lists zipformer pre-trained models and their performance on commonly used open-source test sets, along with usage instructions.

Configuration¶

zipformer models are available in xlarge, large, medium, and small variants, the model configurations are as follows.

Variant	Parameters	Configration
xlarge	300M	--num-encoder-layers 2,2,4,5,4,2 --feedforward-dim 512,1024,2048,3072,2048,1024 --encoder-dim 192,384,768,1024,768,384 --encoder-unmasked-dim 192,256,320,512,320,256
large	150M	--num-encoder-layers 2,2,4,5,4,2 --feedforward-dim 512,768,1536,2048,1536,768 --encoder-dim 192,256,512,768,512,256 --encoder-unmasked-dim 192,192,256,320,256,192
medium	65M	--num-encoder-layers 2,2,3,4,3,2 --feedforward-dim 512,768,1024,1536,1024,768 --encoder-dim 192,256,384,512,384,256 --encoder-unmasked-dim 192,192,256,256,256,192
small	25M	--num-encoder-layers 2,2,2,2,2,2 --feedforward-dim 512,768,768,768,768,768 --encoder-dim 192,256,256,256,256,256 --encoder-unmasked-dim 192,192,192,192,192,192

Chinese-English Models¶

Non-Streaming Models¶

Name	Parameters	Download	aishell test 1 / 2	wenetspeech test-net/meeting	Common Voice zh	kespeech test	librispeech test-clean / other	gigaspeech test	Common voice en	tedium test
xlarge-ctc	300M	HuggingFace	1.61 / 2.7	5.35 / 6.39	8.26	5.74	3.51 / 7.78	14.53	28.57	15.07
large-ctc	150M	HuggingFace	2.51 / 3.51	6.23 / 6.67	7.96	8.95	2.62 / 5.17	10.73	12.99	10.11
large-rnnt	150M	HuggingFace	2.42 / 3.55	6.7 / 7.81	7.92	8.88	2.27 / 4.64	10.08	11.27	9.82
medium-ctc	65M	HuggingFace	3.08 / 3.98	7.08 / 7.62	9.2	11.23	3.01 / 6.06	11.22	15.28	10.38
medium-rnnt	65M	HuggingFace	2.67 / 3.67	6.79 / 7.33	8.97	10.67	2.61 / 5.36	10.56	12.94	10.06
small-ctc	35M	HuggingFace	4.82 / 5.5	10.09 / 11.3	12.76	16.07	5.12 / 10.67	22.27	23.7	11.04
small-rnnt	35M	HuggingFace	3.92 / 4.74	9.09 / 10.57	11.86	14.84	3.78 / 8.65	16.1	18.21	6.79

Directory Structure¶

Below are the files included in non-streaming zipformer speech recognition models:

.
├── ctc.fp16.onnx
├── ctc.int8.onnx
├── ctc.onnx
├── data
│   ├── tokens.txt
│   └── zh-en-8776.vocab
├── decoder.onnx
├── encoder.fp16.onnx
├── encoder.int8.onnx
├── encoder.onnx
├── jit_model.pt
├── joiner.fp16.onnx
├── joiner.int8.onnx
├── joiner.onnx
└── model.pt

model.pt is the PyTorch state_dict of the model, shared by CTC and Transducer models. It can be used to export jit scripted and ONNX models, or as a starting point for fine-tuning.
jit_model.pt is the jit scripted model, shared by CTC and Transducer models, suitable for deployment with torch.jit.script.
ctc.onnx, ctc.fp16.onnx, ctc.int8.onnx are the exported CTC head ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants).
encoder.onnx, encoder.fp16.onnx, encoder.int8.onnx are the exported Transducer encoder ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants). decoder.onnx is the exported Transducer decoder ONNX model (note: decoder does not have fp16 and int8 variants). joiner.onnx, joiner.fp16.onnx, joiner.int8.onnx are the exported Transducer joiner ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants).
The data directory contains the BPE model and tokens.

Usage¶

This section only covers usage methods supported by the zipformer repository (i.e., Python-based usage). For inference and deployment on other languages, operating systems, and hardware platforms, please refer to the deployment section.

Command Line¶

The examples below use zipformer-large; other models follow the same pattern.

CTC head inference

Using downloaded modelsAuto-downloading from HuggingFace

# jit script model
zipformer inference \
    --model zipformer-large/jit_model.pt \
    --ctc 1 \
    --model-type jit \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --model zipformer-large/ctc.onnx \
    --ctc 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --model zipformer-large/ctc.fp16.onnx \
    --ctc 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --model zipformer-large/ctc.int8.onnx \
    --ctc 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# jit script model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --ctc 1 \
    --model-type jit \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --ctc 1 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --ctc 1 \
    --dtype fp16 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --ctc 1 \
    --dtype int8 \
    --model-type onnx \
    data/en.wav data/zh.wav

Transducer head inference

Using downloaded modelsAuto-downloading from HuggingFace

# jit script model
zipformer inference \
    --model zipformer-large/jit_model.pt \
    --model-type jit \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --encoder zipformer-large/encoder.onnx \
    --decoder zipformer-large/decoder.onnx \
    --joiner zipformer-large/joiner.onnx \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --encoder zipformer-large/encoder.fp16.onnx \
    --decoder zipformer-large/decoder.onnx \
    --joiner zipformer-large/joiner.fp16.onnx \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --encoder zipformer-large/encoder.int8.onnx \
    --decoder zipformer-large/decoder.onnx \
    --joiner zipformer-large/joiner.int8.onnx \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# jit script model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --model-type jit \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --dtype fp16 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --hf-model pkufool/zipformer-large \
    --dtype int8 \
    --model-type onnx \
    data/en.wav data/zh.wav

Python API¶

CTC head inference

Using downloaded modelsAuto-downloading from HuggingFace

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    model="zipformer-large/jit_model.pt",
    tokens="data/tokens.txt",
    ctc=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large/ctc.onnx",
    tokens="data/tokens.txt",
    ctc=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large/ctc.fp16.onnx",
    tokens="data/tokens.txt",
    ctc=True,
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large/ctc.int8.onnx",
    tokens="data/tokens.txt",
    ctc=True,
)

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    hf_model="pkufool/zipformer-large",
    ctc=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
    ctc=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
    ctc=True,
    dtype="fp16",
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
    ctc=True,
    dtype="int8",
)

Transducer head inference

Using downloaded modelsAuto-downloading from HuggingFace

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    model="zipformer-large/jit_model.pt",
    tokens="data/tokens.txt",
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large/encoder.onnx",
    decoder="zipformer-large/decoder.onnx",
    joiner="zipformer-large/joiner.onnx",
    tokens="data/tokens.txt",
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large/encoder.fp16.onnx",
    decoder="zipformer-large/decoder.onnx",
    joiner="zipformer-large/joiner.fp16.onnx",
    tokens="data/tokens.txt",
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large/encoder.int8.onnx",
    decoder="zipformer-large/decoder.onnx",
    joiner="zipformer-large/joiner.int8.onnx",
    tokens="data/tokens.txt",
)

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    hf_model="pkufool/zipformer-large",
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
    dtype="fp16",
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large",
    dtype="int8",
)

Streaming Models¶

The evaluation results below use --chunk-size 16 --left-context-frames 128

Name	Parameters	Download	aishell test 1 / 2	wenetspeech test-net/meeting	Common Voice zh	kespeech test	librispeech test-clean / other	gigaspeech test	Common voice en	tedium test
large-ctc	150M	HuggingFace	3.78 / 4.71	8.65 / 10.54	11.8	15.35	3.74 / 8.5	12.32	19.7	10.92
large-rnnt	150M	HuggingFace	3.53 / 4.48	8.31 / 10.27	11.99	14.83	3.26 / 7.51	11.77	17.53	10.82
medium-ctc	65M	HuggingFace	4.46 / 5.09	9.74 / 11.21	12.68	11.26	4.28 / 9.4	12.96	21.77	11.26
medium-rnnt	65M	HuggingFace	3.9 / 4.79	9.05 / 10.82	12.41	17.89	3.64 / 8.08	12.13	18.97	10.9
small-ctc	35M	HuggingFace	6.7 / 7.24	12.92 / 16.45	17.18	23.32	19.4 / 29.66	26.18	33.52	17.67
small-rnnt	35M	HuggingFace	5.69 / 6.26	12.06 / 16.13	16.51	22.29	8.15 / 16.91	19.77	28.54	14.23

Directory Structure¶

Below are the files included in streaming zipformer speech recognition models:

.
├── ctc-chunk-16-left-64.fp16.onnx
├── ctc-chunk-16-left-64.int8.onnx
├── ctc-chunk-16-left-64.onnx
├── ctc-chunk-32-left-128.fp16.onnx
├── ctc-chunk-32-left-128.int8.onnx
├── ctc-chunk-32-left-128.onnx
├── ctc-chunk-64-left-256.fp16.onnx
├── ctc-chunk-64-left-256.int8.onnx
├── ctc-chunk-64-left-256.onnx
├── data
│   ├── tokens.txt
│   └── zh-en-8776.vocab
├── decoder-chunk-16-left-64.onnx
├── decoder-chunk-32-left-128.onnx
├── decoder-chunk-64-left-256.onnx
├── encoder-chunk-16-left-64.fp16.onnx
├── encoder-chunk-16-left-64.int8.onnx
├── encoder-chunk-16-left-64.onnx
├── encoder-chunk-32-left-128.fp16.onnx
├── encoder-chunk-32-left-128.int8.onnx
├── encoder-chunk-32-left-128.onnx
├── encoder-chunk-64-left-256.fp16.onnx
├── encoder-chunk-64-left-256.int8.onnx
├── encoder-chunk-64-left-256.onnx
├── jit_model-chunk-16-left-64.pt
├── jit_model-chunk-32-left-128.pt
├── jit_model-chunk-64-left-256.pt
├── joiner-chunk-16-left-64.fp16.onnx
├── joiner-chunk-16-left-64.int8.onnx
├── joiner-chunk-16-left-64.onnx
├── joiner-chunk-32-left-128.fp16.onnx
├── joiner-chunk-32-left-128.int8.onnx
├── joiner-chunk-32-left-128.onnx
├── joiner-chunk-64-left-256.fp16.onnx
├── joiner-chunk-64-left-256.int8.onnx
├── joiner-chunk-64-left-256.onnx
└── model.pt

The exported streaming models come in three latency variants: chunk-size=16, left-context-frames=64 (320ms latency), chunk-size=32, left-context-frames=128 (640ms latency), chunk-size=64, left-context-frames=256 (1280ms latency).

model.pt is the PyTorch state_dict of the model, shared by CTC and Transducer models. It can be used to export jit scripted and ONNX models, or as a starting point for fine-tuning.
jit_model-chunk-{chunk-size}-left-{left-context-frames}.pt is the jit scripted model, shared by CTC and Transducer models, suitable for deployment with torch.jit.script.
ctc-chunk-{chunk-size}-left-{left-context-frames}.onnx, ctc-chunk-{chunk-size}-left-{left-context-frames}.fp16.onnx, ctc-chunk-{chunk-size}-left-{left-context-frames}.int8.onnx are the exported CTC head ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants).
encoder-chunk-{chunk-size}-left-{left-context-frames}.onnx, encoder-chunk-{chunk-size}-left-{left-context-frames}.fp16.onnx, encoder-chunk-{chunk-size}-left-{left-context-frames}.int8.onnx are the exported Transducer encoder ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants). decoder-chunk-{chunk-size}-left-{left-context-frames}.onnx is the exported Transducer decoder ONNX model (note: decoder does not have fp16 and int8 variants). joiner-chunk-{chunk-size}-left-{left-context-frames}.onnx, joiner-chunk-{chunk-size}-left-{left-context-frames}.fp16.onnx, joiner-chunk-{chunk-size}-left-{left-context-frames}.int8.onnx are the exported Transducer joiner ONNX models, corresponding to float32, float16, and int8 data types respectively (note: some models may not include fp16 and int8 variants).
The data directory contains the BPE model and tokens.

Usage¶

This section only covers usage methods supported by the zipformer repository (i.e., Python-based usage). For inference and deployment on other languages, operating systems, and hardware platforms, please refer to the deployment section.

Command Line¶

The examples below use zipformer-large-streaming with chunk-size=32, left-context-frames=128; other models follow the same pattern.

CTC head inference

Using downloaded modelsAuto-downloading from HuggingFace

# jit script model
zipformer inference \
    --model zipformer-large-streaming/jit_model-chunk-32-left-128.pt \
    --ctc 1 \
    --streaming 1 \
    --model-type jit \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --model zipformer-large-streaming/ctc-chunk-32-left-128.onnx \
    --ctc 1 \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --model zipformer-large-streaming/ctc-chunk-32-left-128.fp16.onnx \
    --ctc 1 \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --model zipformer-large-streaming/ctc-chunk-32-left-128.int8.onnx \
    --ctc 1 \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# jit script model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --ctc 1 \
    --streaming 1 \
    --model-type jit \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --ctc 1 \
    --streaming 1 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --ctc 1 \
    --streaming 1 \
    --dtype fp16 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --ctc 1 \
    --streaming 1 \
    --dtype int8 \
    --model-type onnx \
    data/en.wav data/zh.wav

Transducer head inference

Using downloaded modelsAuto-downloading from HuggingFace

# jit script model
zipformer inference \
    --model zipformer-large-streaming/jit_model-chunk-32-left-128.pt \
    --streaming 1 \
    --model-type jit \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --encoder zipformer-large-streaming/encoder-chunk-32-left-128.onnx \
    --decoder zipformer-large-streaming/decoder-chunk-32-left-128.onnx \
    --joiner zipformer-large-streaming/joiner-chunk-32-left-128.onnx \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --encoder zipformer-large-streaming/encoder-chunk-32-left-128.fp16.onnx \
    --decoder zipformer-large-streaming/decoder-chunk-32-left-128.onnx \
    --joiner zipformer-large-streaming/joiner-chunk-32-left-128.fp16.onnx \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --encoder zipformer-large-streaming/encoder-chunk-32-left-128.int8.onnx \
    --decoder zipformer-large-streaming/decoder-chunk-32-left-128.onnx \
    --joiner zipformer-large-streaming/joiner-chunk-32-left-128.int8.onnx \
    --streaming 1 \
    --model-type onnx \
    --tokens data/tokens.txt \
    data/en.wav data/zh.wav

# jit script model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --streaming 1 \
    --model-type jit \
    data/en.wav data/zh.wav

# onnx model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --streaming 1 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx fp16 model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --streaming 1 \
    --dtype fp16 \
    --model-type onnx \
    data/en.wav data/zh.wav

# onnx int8 model
zipformer inference \
    --hf-model pkufool/zipformer-large-streaming \
    --streaming 1 \
    --dtype int8 \
    --model-type onnx \
    data/en.wav data/zh.wav

Python API¶

CTC head inference

Using downloaded modelsAuto-downloading from HuggingFace

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    model="zipformer-large-streaming/jit_model-chunk-32-left-128.pt",
    tokens="data/tokens.txt",
    streaming=True,
    ctc=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large-streaming/ctc-chunk-32-left-128.onnx",
    tokens="data/tokens.txt",
    streaming=True,
    ctc=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large-streaming/ctc-chunk-32-left-128.fp16.onnx",
    tokens="data/tokens.txt",
    streaming=True,
    ctc=True,
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    model="zipformer-large-streaming/ctc-chunk-32-left-128.int8.onnx",
    tokens="data/tokens.txt",
    streaming=True,
    ctc=True,
)

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    ctc=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    ctc=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    ctc=True,
    dtype="fp16",
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    ctc=True,
    dtype="int8",
)

Transducer head inference

Using downloaded modelsAuto-downloading from HuggingFace

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    model="zipformer-large-streaming/jit_model-chunk-32-left-128.pt",
    tokens="data/tokens.txt",
    streaming=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large-streaming/encoder-chunk-32-left-128.onnx",
    decoder="zipformer-large-streaming/decoder-chunk-32-left-128.onnx",
    joiner="zipformer-large-streaming/joiner-chunk-32-left-128.onnx",
    tokens="data/tokens.txt",
    streaming=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large-streaming/encoder-chunk-32-left-128.fp16.onnx",
    decoder="zipformer-large-streaming/decoder-chunk-32-left-128.onnx",
    joiner="zipformer-large-streaming/joiner-chunk-32-left-128.fp16.onnx",
    tokens="data/tokens.txt",
    streaming=True,
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    encoder="zipformer-large-streaming/encoder-chunk-32-left-128.int8.onnx",
    decoder="zipformer-large-streaming/decoder-chunk-32-left-128.onnx",
    joiner="zipformer-large-streaming/joiner-chunk-32-left-128.int8.onnx",
    tokens="data/tokens.txt",
    streaming=True,
)

from zipformer import inference

# jit script model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="jit",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
)

# onnx model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
)

# onnx fp16 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    dtype="fp16",
)

# onnx int8 model
results = inference(
    ["data/en.wav", "data/zh.wav"],
    model_type="onnx",
    hf_model="pkufool/zipformer-large-streaming",
    streaming=True,
    dtype="int8",
)

Models

Configuration¶

Chinese-English Models¶

Non-Streaming Models¶

Directory Structure¶

Usage¶

Command Line¶

Python API¶

Streaming Models¶

Directory Structure¶

Usage¶

Command Line¶

Python API¶

Comments