I tried to use DeepSpeech to transcribe last Friday's stream for editing purposes. It came out reading as though it were from a Markov bot.

"it is again stir so one of their god lichonin ... and had you know six in the morning the spies they came to me that quickly but he realized it was it was this and the box tom that is a piece of part ... god blue desperate busy sea libraries the niobrara with lucid programming hospitally not cluttered the bissextile"

@jakob I was surprised to find that DeepSpeech doesn't seem to be using of a good language model, as it frequently produces obscure words and even weird combination of letters. At last, I found vosk from alphacep. It's quite accurate even when used by a non-native speaker :ablobowo:

Follow

@wzhd This time it wrote me a song:

"here
our
we're
how are your
oh wow
oh
wow
her
no
our our own
terror
how long
oh
our share

her mom
oh oh"

I think this is my fault, though. I should read the documentation.

· · Web · 1 · 0 · 0

@jakob I find the test_microphone.py example a good place to start, sample rate and format are handled, so I don't need to worry about converting codec or getting timing wrong. I think the first time I talked into the microphone, words got recognized one by one with little latency, also the partial result sometimes changes to make it more likely to be a English sentence

Sign in to participate in the conversation
Mastodon @ SDF

"I appreciate SDF but it's a general-purpose server and the name doesn't make it obvious that it's about art." - Eugen Rochko