ToggleSub Q&A

Main Menu

Who's Online

ToggleSub Q&A

Written by Administrator

Saturday, 26 October 2024 13:03

Questions & Answers:

Why is the transcription often not accurate?

1) The quality of the input. If the articulation of the speaker is not precise enough, the transcribing servers might have difficulties. Or a strong dialects, accents. Or too fast. Or the volume is just too low.

2) Background noise. The servers will try to extract speech from any surrounding sounds but might fail in precisely doing so. Other people talking in at close range or music is deadly for speech recognition. Real hearing people can easily distinguish speech from noise and music but speech recognition computers can not (yet).

3) Bad internet connection. If the network connection is slow, the sound of the speech is not sent fast enough to the servers so the servers only process parts of the speech resulting in incomplete transcriptions.

4) Small language. Simply put, Apple puts in more money, energy and training into bigger languages like English followed bij Spanish and Chinese. Smaller languages are less important to Apple.

5) Interpretation. Nowadays transcribing speech is more than just converting sounds. The servers try to analyse the meaning of the spoken sentence and reply with their understanding. That might be wrong. It can even happen that first you see the right word(s) but then it suddenly changes to a wrong interpretation. Silly but unavoidable.

Why does the transcription often stalls for a moment?

1) Bad internet connection.

2) Apple's servers overloaded.

3) Apple limits continuous speech-to-text streams to close to a minute. That means that the app will close and restart the stream after ~ 50 seconds. This will most likely cause a hickup in the transcription.

Why do i see sometimes nothing or really wrong words appearing?

Besides possible problems with the above mentioned issues, you could have selected the wrong language. The app and the servers do not have automatic language recognition so you have to select the right language to transcribe. Also nowadays people often use foreign words so the servers might get confused. For example talking in Dutch to an English listening computer will give unexpected results.

Why don't I see interpunction at all or capitals at the beginning of e new sentence?

Unfortunately Apple's speech recognition (and by the way most others as well) are still not clever enough to really understand what is said or who said what. That simply means that the machines have no clue of where to put a full stop and start a new sentence. Let alone comma's, question marks or exclamation marks. However, it is expected that in the near future this will automatically improve even without updating the ToggleSub app. The software and Artificial Intelligence on Apple Speech Recognition servers will definitely improve over time.

Only when a significant pause is recognised or a continuous speech stream had to be restarted after ~50 seconds, the app will clear the subs and start a new line with a capital.

I can't access other app windows with the mouse in the area the subtitles are displayed.

That's right and by choice. To display the subtitles always on top, the overlay is, besides some Apple features like the Dock, the top-most window. Since it is transparent, it looks like you can acces the windows underneath but you can not. And there is where the Toggle comes into play. CTL-ALT-CMD-S will switch the window from on to off, do what you want to do and switch it on again. It's highly recommend to switch it off when you are not transcribing at all. Not only you won't have issue accessing anything else on the screen but you also avoid useless network traffic

To get used to it, in the preferences System Tab, you can switch on a border which will light up whenever the mouse enters the subtitle area. After a while and getting more experienced with the app, you will probably switch it off again.

What is that percentage popping up every now and then?

Apple's transcription servers irregularly send a "confidence" percentage. It's a indication of the confidence they have transcribing the speech to readable text. However, it is therefore more an indication of the quality of the input and connection since the servers are actually pretty good with continuous, clearly articulated, undisturbed speech. A green color means pretty good, a red color pretty bad. If it is always red (or yellow) you should look after the quality of the input.

The issue is a bit that this confidence percentage is only send by the servers when there is a pause, a probable end of speech. However for example in discussion programs or interviews, people tend to keep talking at a high rate so when there is finally a sort of pause, the confidence percentage hardly reflects reality.

When it's getting a bit annoying, you can switch it off in the app's Preferences just like the volume bar.

Last Updated on Saturday, 26 October 2024 19:47

squixz.com

Main Menu

Who's Online