Model Component

# Kaldi nnet3
Check `steps/libs/nnet3/xconfig`
Reference: [How to Config Kaldi nnet3 (in Chinese 簡體)]

(http://166.111.134.19:8081/tangzy/files/trp_nnet3_config.pdf)
## Index
- [Descriptor](#Descriptor)
- [basic_layers](#basic_layers)
## Descriptor
descriptor describes how to glue node output as an input of a particular network-

node
Summary of descriptors
- `<descriptor> ::= <node-name>`

- `<descriptor> ::= Append(<descriptor>, <descriptor> [, <descriptor> ... ] )`
- `<descriptor> ::= Sum(<descriptor>, <descriptor>)`
- `<descriptor> ::= Const(<value>, <dimension>)`
- `<descriptor> ::= Scale(<scale>, <descriptor>)`
- `<descriptor> ::= Failover(<descriptor>, <descriptor>)`
- `<descriptor> ::= IfDefined(<descriptor>)`
- `<descriptor> ::= Offset(<descriptor>, <t-offset> [, <x-offset> ] )`
- `<descriptor> ::= Switch(<descriptor>, <descriptor> [, <descriptor> ...])`
- `<descriptor> ::= Round(<descriptor>, <t-modulus>)`
- `<descriptor> ::= ReplaceIndex(<descriptor>, <variable-name>, <value>)`:
`<variable-name> = <t|x>`
In `xconfig`, ìnput@-3` means Òffset(input, -3)`
If you don't specify the ìnput` attribute, the default value is `[-1]`, which
represents the previous layer.
Àppend(-1, 0, 1)` is a short-cut of Àppend(Offset(prev_layer-1), prev_layer,

Offset(prev_layer+1))`
Note that `-1` and `[-1]` are different
## <a name="basic_layers"></a>basic_layers
- **input**: `basic_layers.XconfigInputLayer`
- Attributes:
- `name=<str>`
- `dim=<int>`
- Example usages:
```
input name=input dim=40
input name=ivector dim=100
```
- **output**: `basic_layers.XconfigTrivialOutputLayer`
- Attributes:
- `name=<str>`
- ìnput=<str>`: `[-1]` means output the most recent layer
- òbjective-type=<linear|quadratic>`: quadratic is for regression
- òutput-delay=<int>`: shift the frames on the output, this would increase
the latency
- Comments: No transformation involved
- Example usages:
```
output name=output input=Append(input@-1, input@0, input@1,
ReplaceIndex(ivector, t, 0))
```
- **output-layer**: `basic_layers.XconfigOutputLayer`
- Attributes:
- `name=<str>`
- ìnput=<str>`: `[-1]` means output the most recent layer
- `dim=<int>`: normally equal the number of pdfs
- `bottleneck-dim=<int>`: if specified, use linear component instead of
affine one, constrained to be orthonormal
- òrthonormal-constraint=<float>`: only used if `bottleneck-dim` is set
- ìnclude-log-softmax=<true|false>`: `false` is useful for chain models
- òbjective-type=<linear|quadratic>`: quadratic is for regression
- `learning-rate-factor=<float>`: use `0.5/xent_regularize` for output layers
in chain models
- `max-change=<float>`: how much the matrix change in each iteration
- `l2-regularize=<float>`
- òutput-delay=<int>`: shift the frames on the output, this would increase
the latency
- `ng-affine-options`: supply options to affine layers
- `ng-linear-options`: supply options to linear layers (if `bottleneck-dim`
is supplied)
- `param-stddev=<float>`
- `bias-stddev=<float>`
- Comments: No transformation involved

- Example usages:
```
output name=output input=Append(input@-1, input@0, input@1,
ReplaceIndex(ivector, t, 0))
```
- **relu, renorm, tanh, sigmoid, batchnorm, so, dropout** :
`basic_layers.XconfigBasicLayer`
- Combination of nonlinearities
- **relu-layer**
- **relu-renorm-layer**
- **relu-batchnorm-dropout-layer**
- **relu-dropout-layer**
- **relu-batchnorm-layer**
- **relu-batchnorm-so-layer**
- **batchnorm-so-relu-layer**
- **sigmoid-layer**
- **tanh-layer**
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- `self-repair-scale=<float>`
- `target-rms=<float>`
- `ng-affine-options`
- `ng-linear-options`
- `dropout-proportion=<float>`
- `dropout-per-dim=<true|false>`
- `dropout-per-dim-continuous=<true|false>`
- àdd-log-stddev=<true|false>`
- `learning-rate-factor=<float>`
- `max-change=<float>`
- Example Usages:
```
relu-renorm-layer name=layer1 dim=1024 input=Append(-3,0,3)
sigmoid-layer name=layer1 dim=1024 input=Append(-3,0,3)

```
- **fixed-affine-layer** : `basic_layers.XconfigFixedAffineLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- àffine-transform-file=<path>`: output path, the file will not be
read
- `delay=<int>`: optional delay for the òutput-node` in ìnit.config`
- `write-init-config=<true|false>`
- Example usages:
```
fixed-affine-layer name=lda input=Append(-2,-
1,0,1,2,ReplaceIndex(ivector, t, 0)) affine-transform-file=foo/bar/lda.mat
```
- **affine-layer** : `basic_layers.XconfigAffineLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- `param-stddev=<float>`: this has to be initialized to
`1/sqrt(input_dim)`
- `bias-mean=<float>`
- Example usages:
```
affine-layer name=affine input=Append(-2,-1,0,1,2,ReplaceIndex(ivector,
t, 0))
```
- **idct-layer** : `basic_layers.XconfigIdctLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- `cepstral-lifter=<float>`: liftering coefficinet
- àffine-transform-file=<path>`: output path, the file will not be
read
- Example usages:
```
idct-layer name=idct dim=40 cepstral-lifter=22 affine-transform-
file=foo/bar/idct.mat
```
#`lstm`
- **lstm-layer**: `lstm.XconfigLstmLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `clipping-threshold=<float>`
- `delay=<int>`
- `ng-per-element-scale-options`
- `self-repair-scale-nonlinearity=<float>`
- `zeroing-interval=<int>`
- `zeroing-threshold=<float>`
- `decay-time=<float>`
- Example usages:
```
lstm-layer name=lstm1 input=[-1] delay=-3
```
- **lstmp-layer, lstmp-batchnorm-layer**: `lstm.XconfigLstmpLayer`

- Comments: use the `fast-*` versions below
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `recurrent-projection-dim=<int>`
- `non-recurrent-projection-dim=<int>`
- `delay=<int>`
- `ng-per-element-scale-options`
- `dropout-per-frame=<true|false>`
- Example usages:
```
lstmp-layer name=lstm1 input=[-1] delay=-3
```
- **fast-lstm-layer, fast-lstm-batchnorm-layer**: `lstm.XconfigFastLstmLayer`

- Comments: use the `fast-*` versions below
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- `lstm-nonlinearity-options`
- Example usages:
```
fast-lstm-layer name=lstm1 input=[-1] delay=-3
```
- **fast-lstmp-layer, fast-lstmp-batchnorm-layer**: `lstm.XconfigFastLstmpLayer`

- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usages:
```
fast-lstmp-layer name=lstm1 input=[-1] delay=-3
fast-lstmp-layer name=lstm1 input=[-1] delay=-3 cell-dim=1024

recurrent-projection-dim=512 \
non-recurrent-projection-dim=512
```
- **lstmb-layer**: `lstm.XconfigLstmbLayer`
- Comments: this type of layers already contain **batch normalization**
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `bottleneck-dim=<int>`
- òrthonormal-constraint=<float>`
- `delay=<int>`
- `self-scale=<float>`
- Example usages:
```
lstmb-layer name=lstm1 input=[-1] delay=-3
```
# `stats_layer`
- **stats-layer**: `stats_layer.XconfigStatsLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- `config=<str>`: the following are supported statistics
- `mean`
- `mean+stddev`
- `mean+count`
- `mean+stddev+count`
- Example usages:
```
stats-layer name=tdnn1-stats config=mean+stddev(-99:3:9:99) input=tdnn1
```
# `convolution`
- **relu-conv-layer, conv-layer, conv-relu-layer, conv-renorm-layer, relu-conv-

renorm-layer, batchnorm-conv-layer, conv-relu-renorm-layer, batchnorm-conv-relu-
layer, relu-batchnorm-conv-layer, relu-batchnorm-noconv-layer, conv-relu-batchnorm-
so-layer, conv-relu-batchnorm-dropout-layer, conv-relu-dropout-layer**:
`convolution.XconfigConvLayer`
- Attributes:
- ìnput=<str>`
- `height-in=<int>`
- `height-subsample-out=<int>`
- `height-offsets=<int>`
- `num-filters-outj=<int>`
- `time-offsets=<int>`
- `required-time-offsets=<int>`
- `self-repair-lower-threshold=<float>`
- ùse-natural-gradient=<true|false>`
- `rank-in=<int>`
- `rank-out<int>`
- `num-minibatches-history=<int>`
- àlpha-in`
- àlpha-out`
- Example usages:
```
conv-batchnorm-layer name=conv2 height-in=40 height-out=40 num-filters-
out=64 \
height-offsets=-1,0,1 time-offsets=-
1,0,1 required-time-offsets=0
```
- **res-block**: `convolution.XconfigResBlock`
- Attributes:
- ìnput=<str>`
- `height=<int>`
- `num-filters=<int>`
- `num-bottleneck-filters=<int>`
- `time-period=<int>`
- `height-period=<int>`
- `self-repair-lower-threshold1=<float>`
- àllow-zero-padding=<true|false>`
- `bypass-source`
- `param-stddev`
- `bias-stddev`
- ùse-natural-gradient`
- `rank-in`
- `rank-out`
- `num-minibatches-history`
- àlpha-in`
- àlpha-out`
- `l2-regularize`
- Example usage:
```
res-block name=res1 num-filters=64 height=32 time-period=1
```
- **res2-block**: `convolution.XconfigRes2Block`
- Attributes:
- ìnput=<str>`
- `height=<int>`
- `height-in=<int>`
- `height-out=<int>`
- `num-filters=<int>`
- `num-bottleneck-filters=<int>`
- `time-period=<int>`
- àllow-zero-padding=<true|false>`
- `param-stddev`
- `bias-stddev`
- ùse-natural-gradient`
- `rank-in`
- `rank-out`
- `num-minibatches-history`
- àlpha-in`
- àlpha-out`
- `l2-regularize`
- Example usage:
```
res2-block name=res1 num-filters=64 height=32 time-period=1
```
- **channel-average-layer**: `convolution.ChannelAverageLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- Example usage:
```
channel-average-layer name=channel-average input=Append(2, 4, 6, 8)
dim=64
```
# àttention`
- **attention-renorm-layer, attention-relu-renorm-layer, relu-renorm-attention-

layer, relu-renorm-attention-layer**: àttention.XconfigAttentionLayer`
- Attributes
- ìnput=<str>`
- `dim=<int>`
- `num-left-inputs-required=<int>`
- `num-right-inputs-required=<int>`
- òutput-context=<true|false>`
- `time-stride=<int>`
- `num-heads=<int>`
- `key-dim=<int>`
- `key-scale=<float>`
- `value-dim=<int>`
- `num-left-inputs=<int>`
- `num-right-inputs=<int>`
- Example usage
```
attention-renorm-layer num-heads=10 value-dim=50 key-dim=50 time-
stride=3 \
num-left-inputs=5 num-right-
inputs=2.
```
# `gru`
- **gru-layer**: `gru.XconfigGruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- `ng-per-elemnet-scale-options`
- Example usage:
```
gru-layer name=gru1 input=[-1] delay=-3
```
- **pgru-layer**: `gru.XconfigPgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
pgru-layer name=pgru1 input=[-1] delay=-3
```
- **norm-pgru-layer**: `gru.XconfigNormPgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
norm-pgru-layer name=norm-pgru1 input=[-1] delay=-3
```
- **opgru-layer** : `gru.XconfigOpgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
opgru-layer name=opgru1 input=[-1] delay=-3
```
- **norm-opgru-layer**: `gru.XconfigNormOpgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
norm-opgru-layer name=norm-opgru1 input=[-1] delay=-3
```
- **fast-gru-layer** : `gru.XconfigFastGruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- `gru-nonlinearity-options`
- Example usage:
```
fast-gru-layer name=gru1 input=[-1] delay=-3
```
- **fast-pgru-layer** : `gru.XconfigFastPgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
fast-pgru-layer name=pgru1 input=[-1] delay=-3
```
- **fast-norm-pgru-layer** : `gru.XconfigFastNormPgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
fast-norm-pgru-layer name=pgru1 input=[-1] delay=-3
```
- **fast-opgru-layer** : `gru.XconfigFastOpgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
fast-opgru-layer name=opgru1 input=[-1] delay=-3
```
- **fast-norm-opgru-layer**: `gru.XconfigFastNormOpgruLayer`
- Attributes:
- ìnput=<str>`
- `cell-dim=<int>`
- `delay=<int>`
- Example usage:
```
fast-norm-opgru-layer name=opgru1 input=[-1] delay=-3
```
#`composite_layers`
- **tdnnf-layer**: `composite_layers.XconfigTdnnfLayer`
- Attributes:
- ìnput=<str>`
- `dim=<int>`
- `bottleneck-dim=<int>`
- `bypass-scale=<float>`
- `time-stride=<int>`
- Example usage:
```
tdnnf-layer name=tdnnf2 dim=1024 bottleneck-dim=128 dropout-
proportion=0.0 time-stride=3
```
roughly equal to the following
```
linear-component name=tdnnf2.linear dim=128 orthonormal-constraint=-1.0
\
input=Append(Offset(-3, tdnnf1), tdnnf1)
relu-batchnorm-dropout-layer name=tdnnf2.affine dim=1024 dropout-
proportion=0.0 \
dropout-per-dim-
continuous=true input=Append(0,3)
no-op-component name=tdnnf2 input=Sum(Scale(0.66,tdnnf1), tdnn2.affine)
```
- **prefinal-layer**: `composite_layers.XconfigPrefinalLayer`
- Attributes:
- ìnput=<str>`
- `big-dim=<int>`
- `small-dim=<int>`
- Example usage:
```
prefinal-layer name=prefinal-chain input=prefinal-l l2-regularize=0.02
big-dim=1024 small-dim=256
```
roughly equal to the following
```
relu-batchnorm-layer name=prefinal-chain input=prefinal-l l2-
regularize=0.02 dim=1024
linear-comonent name=prefinal-chain-l dim=256 l2-regularize=0.02
orthonormal-constraint=-1.0
batchnorm-component name=prefinal-chain-batchnorm
```
#`trivial_layers`
- **renorm-component**: `trivial_layers.XconfigRenormComponent`
- Attributes:
- ìnput=<int>`
- Example usage:
```
renorm-component name=renorm1 input=Append(-3,0,3)
```
- **batchnorm-component**: `trivial_layers.XconfigBatchnormComponent`
- Attributes:
- ìnput=<int>`
- Example usage:
```
batchnorm-component name=batchnorm input=Append(-3,0,3)
```
- **no-op-component**: `trivial_layers.XconfigNoOpComponent`
- Attributes:
- ìnput=<int>`
- Example usage:
```
no-op-component name=noop1 input=Append(-3,0,3)
```
- **linear-component**: `trivial_layers.XconfigLinearComponent`
- Attributes:
- ìnput=<int>`
- `dim=<int>`
- òrthonormal-constraint=<float>`
- Example usage:
```
linear-component name=linear1 dim=1024 input=Append(-3,0,3)
```
- **affine-component**: `trivial_layers.XconfigAffineComponent`
- Attributes:
- ìnput=<int>`
- `dim=<int>`
- òrthonormal-constraint`
- `param-stddev`
- `bias-stddev`
- `l2-regularize`
- Example usage:
```
affine-component name=linear1 dim=1024 input=Append(-3,0,3)
```
- **scale-component**: `trivial_layers.XconfigPerElementScaleComponent`
- Attributes:
- ìnput=<int>`
- `l2-regularize`
- `param-mean`
- `param-stddev`
- `learning-rate-factor`
- Example usage:
```
scale-component name=scale1 input=Append(-3,0,3)
```
- **dim-range-component**: `trivial_layers.XconfigDimRangeComponent`
- Attributes:
- ìnput=<int>`
- `dim=<int>`
- `dim-offset=<int>`
- Example usage:
```
dim-range-component name=feature1 input=Append(-3,0,3) dim=40 dim-
offset=0
```
- **offset-component**: `trivial_layers.XconfigPerElementOffsetComponent`
- Attributes:
- ìnput=<int>`
- `l2-regularize`
- `param-mean`
- `param-stddev`
- `learning-rate-factor`
- Example usage:
```
offset-component name=offset1 input=Append(-3,0,3)
```
- **combine-feature-maps-layer**: `trivial_layers.XconfigCombineFeatureMapsLayer`
- Attributes:
- ìnput=<str>`
- `num-filters1=<int>`
- `height=<int>`
- Example usage:
```
combine-feature-maps-layer name=combine_features1 height=40 num-
filters1=1 num-filters2=4
combine-feature-maps-layer name=combine_features1 height=40 num-
filters1=1 \
num-filters2=4 num-filters3=2
```

Model Component

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Model Component

Uploaded by

Copyright:

Available Formats

# Kaldi nnet3

Reference: [How to Config Kaldi nnet3 (in Chinese 簡體)]

descriptor describes how to glue node output as an input of a particular network-

- `<descriptor> ::= <node-name>`

In `xconfig`, `input@-3` means `Offset(input, -3)`

`Append(-1, 0, 1)` is a short-cut of `Append(Offset(prev_layer-1), prev_layer,

Note that `-1` and `[-1]` are different

- Comments: No transformation involved

sigmoid-layer name=layer1 dim=1024 input=Append(-3,0,3)

- lstmp-layer, lstmp-batchnorm-layer: `lstm.XconfigLstmpLayer`

- fast-lstm-layer, fast-lstm-batchnorm-layer: `lstm.XconfigFastLstmLayer`

- fast-lstmp-layer, fast-lstmp-batchnorm-layer: `lstm.XconfigFastLstmpLayer`

fast-lstmp-layer name=lstm1 input=[-1] delay=-3 cell-dim=1024

- **relu-conv-layer, conv-layer, conv-relu-layer, conv-renorm-layer, relu-conv-

- **attention-renorm-layer, attention-relu-renorm-layer, relu-renorm-attention-

roughly equal to the following

You might also like