Accessibility has long been an important topic of discussion in the tech industry, with many users advocating for better accessibility options every day. Now, Zoom is taking another step forward and will offer automatic closed captioning to all free accounts by the end of the year. Experts say this is a step in the right direction, especially with so many relying on the service for their jobs and online learning. But they also would like to see Zoom take things a step further. “It’s a starting point, but more needs to be done,” Sheri Byrne-Haber, an accessibility architect at VMware, told Lifewire via email. “Creating the ability to add words to a dictionary would be a good next step. Otherwise, people’s names, abbreviations, and terms not normally in dictionaries—like hyper converged infrastructure—might get butchered.”
Accuracy is Key
Being able to understand each other is a key part of communication, especially when you are in an online environment and dealing with technology issues such as latency and video quality, not to mention multiple people speaking at once. Where people with hearing loss previously could rely on reading lips—or even American Sign Language (ASL), if they knew it—they now have to rely on speech recognition systems to relay important information, something that can lead to additional confusion because of the limitations placed on the service. “There are two things that the speech recognition engines don’t do very well,” Byrne-Haber said later during a call with Lifewire. “The first thing is that it is geared for a more midwestern or California, flat American accent.” “So, if you have somebody that speaks English as a second language or somebody from an area like Maine or Texas, where there are very strong accents, it doesn’t recognize words the same. Accents are a problem and technical terms that aren’t in the dictionary are a problem.” Voice recognition systems need to strive to hit at least a 92% accuracy rate according to Byrne-Haber. A paper from the Rochester Institute of Technology listed a 90% rate of accuracy as the bottom line. Unfortunately, the rating of these systems is all determined by the topic and the person speaking at the time, so results can vary. “I’ve seen accuracy rates on YouTube captioning where it’s someone from outside of the United States and they are talking about medical terms, and I’ve seen an accuracy rate of below 60%,” she told us. With such low accuracy rates, people who rely on captioning have a much harder time following along and processing the information they are being presented. They need to fill in the blanks for words that get picked up incorrectly. This can cause them to fall behind during presentations, and makes the entire learning experience much more difficult.
Waiting for Zoom
While Zoom plans to release automatic closed captioning to everyone in the fall, the company is allowing users to sign up if they need it now, and also has a manual closed caption system that might be useful. Though automatic closed captioning is a feature that is desperately needed, Byrne-Haber told us she’d rather the company take its time and ensure it is offering a stable and reliable product for all users who need it, instead of rushing out something that feels half-finished. Instead, Byrne-Haber rather would see Zoom focusing on adding additional features to its closed captioning system. Giving users the ability to customize the color, size, and even text of the captions would go a long way in helping make things work for them. This is especially important for people who may have a hard time seeing the current white on black background that many closed captioning systems use. Even a feature as small as changing the size of the text could be a huge boon for many. Another wish-list feature is the ability to add specific words to the speech recognition’s dictionary. This would help users who often use words or phrases not typically understood by the system to better utilize closed captioning. “Dragon already does this,” Byrne-Haber told us. “I’m surprised more services don’t offer it.”