Captions are not being generated accurately

HelpdeskCategory: QuestionsCaptions are not being generated accurately
Rae Davis asked 4 years ago

Just wanted to see if there is anything that can be done to have my captions generate more accurately. The last video (Grammarly) I’m working  on has a lot of corrections to be done which defeats the purpose of the app. Please advise what can help me with this issue. 

1 Answers
Craig Staff asked 4 years ago

Sure thing Rae.  
Caption accuracy depends on 3 factors.  Let me address these and then let me address your video directly which I just looked at.
1.  Audio capture.  If you are capturing your audio using a microphone that is far away from you, especially if you are in a room with “room noise” (meaning a room that has echo or reverb or anything that messes with the sound of your voice) your transcriptions could suffer.  I always recommend people use some type of microphone.  However, if this isn’t possible, then the camera needs to be close to you, more like a headshot to get clear audio and you should do it in a room without reverb/echo/room noise.  
2.  Language.  I think we make this pretty clear on the site, so probably not an issue, but if you speak Irish, Welsh, Australian, etc. make sure you choose that flavor of English.
3.  Accent.  If you have a strong a-typical accent, this could have some affect, so in that case, I would probably just slow down a little, but again, I believe a microphone will seriously help with this issue as well.
I have a feature coming that will let you essentially train the AI to understand various words that you say a lot, that it gets wrong, so this will help also, and I also have a feature that will let you mass-replace text easily which will help, but these are all post-processing.  
In the video world, there is a saying, “Fix it in post.”  Which means, “don’t worry about shooting the video, if there are problems we can fix them in post. (post-processing or the editing phase.)”  But, any video pro that knows there stuff hate this phrase.  It is always better to get the best footage and audio you possibly can because it is very difficult in reality to fix a number of things “in post.”
So, in your case, the issue is room noise.  In video terms, your audio sounds ‘muddy.’  It means other noise is getting mixed in with your vocals like water and mud mix together.  And just like when water and mud mix together, it is hard to see where the water ends and the dirt begins.  The fact that you have an accent only accentuates the issue, but I don’t believe your accent is the issue.  

You appear to be reasonably close to the device you are shooting with, so it could be due to the room itself.  If the floor is solid and the walls are concrete for example, that will contribute to a muddy room.  You should ideally record in a room with carpet and drywall.
One thing I would try…  If you are the subject, try recording a short video with your smartphone with the microphone close to you in a room without echo (a clothes closet is the best sound proof room in the house BTW.)  
Then upload that and see if things improve.  If they don’t let me know.
If they do, then I would start working on improving your audio.  A clip-on microphone would be a cheap way to get started.
And here is the rub…  Improving your audio isn’t just for the subtitles.  If the subtitle AI can understand you, everyone can.  Sound and lighting are the two most important parts of video recording, so improving your sound always works to your benefit and will make your viewers happy too.
Let me know how else I can help.