Improving Accuracy for OpenAI’s Whisper

We can use prompts to improve our Whisper transcriptions.

We can add “–initial_prompt” to our command like the following.

--initial_prompt "Computer Historical etc"

We can also look into suppressing Tokens to eliminate words that we won’t use. Believe we need to find the tokens for words, and then we can use the token ID to ignore those words. More links below.

https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L87-L88

https://platform.openai.com/docs/guides/speech-to-text/prompting

https://github.com/openai/whisper/discussions/355

https://github.com/openai/whisper/discussions/117

https://huggingface.co/blog/fine-tune-whisper

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=nbroad