DEMO and DEMO2A bidirectional recurrent neural network model with attention mechanism for restoring missing inter-word punctuation in unsegmented text.
The model can be trained in two stages (second stage is optional):First stage is trained on punctuation annotated text.
Second stage with pause durations can be used for example for restoring punctuation in automatic speech recognition system output.
Optional second stage can be trained on punctuation and pause annotated text.
In this stage the model learns to combine pause durations with textual features and adapts to the target domain.
Training speed with default settings, an optimal Theano installation and a modern GPU should be around 10000 words per second.
Example: to be ,COMMA or not to be ,COMMA that is the question .PERIOD(Optional) Pause annotated text files for training and validation of the second phase model.
This article was summarized automatically with AI / Article-Σ ™/ BuildR BOT™.