Accessibility in Firefox for Android: Anatomy and Life of an Interaction
In my first post, I described at a high level how a blind user interacts with Firefox for Android. In this post, I plan to go into some more technical details as to how some of this works and why it works the way it does. (Henceforth, I will assume that you have read the first post.)
It all begins with the touch of a screen, or the press of a button the braille device. A touch will be intercepted by the screen reader, who may choose to interpret it and activate an Android accessibility action (which Fennec then interprets). Otherwise, touch events pass through directly to Fennec, and we interpret it as appropriate. An interaction from the braille device will provide an Android accessibility action. Once the browser knows what the user wants to do, we can just go ahead and do it.
Most actions produce some sort of output (in speech and/or braille). Most notably, exploring the screen will generate output describing the current focused object. So what do we need to include in this description? Obviously, we include the text of the object itself (e.g. when focused on a link, we need the link text). Moreover, we include the role (e.g. links, buttons, dropdowns, checkboxes, etc.) and the state (e.g. a checkbox can be checked or a button could be pressed). All of this information is included in some form in both speech and braille output. But that is where the similarity ends. Speech and braille are quite different forms of output. In speech, you aren’t as restricted in terms of verbosity; a long string of speech can be processed relatively quickly. Conversely, on a braille display, there are typically between 14 and 80 braille cells (with the latter being not so portable). Thus, for speech output, we can include some more context about the object you’re focused on. For example, on a list item, we indicate the current position within the list (e.g. item 5 of 10). We also provide more general context information, such as the fact you’re entering a table or the main body of a page. This sort of output isn’t as feasible in braille, especially when using a smaller, more mobile braille display. But we can leverage the fact that braille is a more spatial output medium. In particular, it’s easy to check exactly how something is spelled (which is difficult when listening to a text-to-speech engine). Thus, we can shorten common roles and states down to a more condensed format. For example, we can shorten the role name of “button” to “btn” and we can represent a checkbox being checked as “(x)” (with the unchecked variant being “( )”). This allows us to make best use of the braille output.
Once we’ve decided upon the output within the browser, how do we get the screen reader to read out our speech output or communicate to the braille device to display its output? In the case of producing speech output, we populate an AccessibilityNodeInfo with the correct text and information so that TalkBack or other screen readers can speak it out. If we only did this, the braille output would be exactly the same as the speech output, which would be suboptimal for reasons mentioned previously. Instead, we use the SelfBrailleService provided by BrailleBack (the braille accessibility service on Android). This allows us to provide our more optimized output to a braille device.
After all of that, one action is complete and the output is provided in the appropriate medium. Just rinse and repeat a few thousand times for a full user session. I hope this helps in understanding how assistive technologies work.