document status: internal SIG draft
revision date: 2008-02-04
previous versions: Draft 2.03b (2008-01-28), Draft 2.03 (2008-01-26), Draft 2.02b (2008-01-20), Draft 2.02a (2008-01-14), Draft 2.01 (2007-12-22)
authors: Pete Brunet, Vladmir Bulatov, Gregory J. Rosmaita, Janina Sajka and Neil Soiffer (chair, Expert Handlers SIG)
edited and annotated by Gregory J. Rosmaita
please provide feedback on this draft to the Expert Handlers SIG either via the Expert Handlers emailing list (preferred) or directly to this document's "Discussion" page to which posted comments will be appended by the editor. There is also a Scratch Pad for the Unified Use Cases which serves as a collection point for issues and ideas related to expert handlers and its possible implementations.
The purpose and responsibility of accessibility interfaces, such as Microsoft Active Accessibility (MSAA) and IAccessible2 (IA2), is to provide assistive technology (AT) with the ability to access and interact with the information contained in an application. This allows an AT to access the information in the application's DOM. Interpreting, displaying, and navigating the information is the responsibility of the AT.
The success of the web and the increasing use of XML in documents has lead AT to develop support for these “markup based” applications. Therefore, we must distinguish between two classes of markup in order to explain the need for expert handler technology. The type of markup most often used on the web is HTML 4.01/XHTML 1.0. HTML is an example of generalized markup that is well handled by AT and is not addressed in our document further. However, it is significant to note that even generalized markup must sometimes be complimented by markup specifications, such as ARIA, that facilitate more semantically precise content handling where none is present.
Generalized content markup is complimented by markup specifications that facilitate more semantically precise content markup. Examples of specialized, semantically precise markup include MathML and MusicXML. In order for users of AT to access specialized markup effectively, AT needs guidance to communicate the content of the specialized markup language to the user.
The Open A11y expert handlers SIG is exploring a standardized plug-in mechanism to AT software. The goal of this plug-in standard is to allow AT software to take advantage of expert software that understands specialized markup. This plug-in standard will allow the expert software to provide enhanced, semantically rich access to specialized markup, so that the AT can properly render the markup visually, aurally, and/or tactilely. The plug-in would also help users navigate the semantic meaning encoded in the specialized markup.
To provide some background as to what needs to be supported by an expert handler interface standard, the following sections discuss a number of use cases for an expert handler. The uses cases are divided into various functionalities such as speech, navigation, and braille generation. The last section discusses the options for how an expert handler might fit into the sequence of events that eventually results in a response to a user action.
Computer users who are blind or severely visually impaired often use assistive technology (AT) built around synthetic text to speech (TTS). These AT applications are commonly called “screen readers.” Screen reader users listen to a synthetic voice rendering of on screen content because they are physically unable to see this content on a computer display monitor.
Because synthetic voice rendering is intrinsically temporal, whereas on screen displays are (or can easily be made) static, various strategies are provided by screen readers to allow users to tightly control the alternative TTS rendering. Screen reader users often find it useful, for instance, to skim through content until a particular portion is located and then examine that portion in a more controlled manner, perhaps word by word or even character by rendered character. It is almost never useful to wait for a synthetic voice rendering that begins at the upper left of the screen and proceeds left to right, row by row, until it reaches the bottom because such a procedure is temporally inefficient, requiring the user to strain to hear just the portion desired in the midst of unsought content. Thus, screen readers provide mechanisms that allow the user to focus anywhere in the content and examine only that content which is of interest.
Screen readers have proven highly effective at providing their users access to content which is intrinsically textual and linear in nature. It is not hard to provide mechanisms to focus synthetic voice rendering paragraph by paragraph, sentence by sentence, word by word, or character by character.
Access to on screen widgets have also proven effective by rendering that static content in list form, where the user can pick from a menu of options using up and down arrow plus the enter key to indicate a selection, in lieu of picking an icon on screen using a mouse.
Access to content arrayed in a table can also succeed by allowing the AT to simulate the process a sighted user employs to consider tables. In other words, mechanisms are provided to hear the contents of a cell and also the row and column labels for that cell (which define the cell's meaning).
Similar “smart” content rendering and navigation strategies are required by screen reader users in more complex, nonlinear content such as mathematical (chemical, biological, etc) expressions, music, and graphical renderings. Because such content is generally the province of knowledge domain experts and students, and not the domain of most computer users, screen readers do not invest the significant resources necessary to serve only a small portion of their customer base with specialized routines for such content. Furthermore, the general rendering and navigation strategies provided for linear (textual), menu, and tabular content are woefully insufficient to allow users to examine specific portions of such domain specific expressions effectively. On the other hand domain specific markup often does provide sufficient specificity so that the focus and rendering needs of the screen reader can be well supported.
In order to gain effective access to such domain specific content screen reader users require technology that can:
There are users with disabilities who do not require accomodation in order to read domain specific markup. Rather, these users require assistive technologies to facilitate their scrolling and/or editing of content. Highly effective assistive technologies exist to accomodate alternative input strategies ranging from:
Users of alternative input assistive technologies require two specific accomodations for scrolling and editing domain specific content:
AT users need to be able to navigate within sub-components of documents containing specialized content, such as math, music or chemical markup. Typically these specialized components have content which needs to receive “focus” at different levels of granularity, e.g. a numerator within a numerator, an expression, a term, a bar of music, etc.
Within each level, functions are needed in response to AT commands to inspect and navigate to and from “items” (e.g., by word, bar, expression, clause, term, depending upon the type of content being expressed) for a particular level of granularity:
There are two scenarios to consider, a read-only scenario and a scenario where the user is editing the document.
There are three system components that need to interact: the user agent, e.g. a browser, the AT, and the expert handler.
In the read-only case, the AT responds to some sort of “Point of Regard” change event and depending on the “role” of the object which received focus, the AT fetches accessibility information pertinent to that role and then formats/outputs a response tailored to an AT user, e.g. TTS/braille. In the case of specialized content, an expert handler needs to be used by the AT because the AT doesn't know how to deal with such specialized content directly.
In order to meaningfully interact with the specialized content, the user needs to be able to execute the following actions:
In the case of editable content there may also be a desire to have separate cursors, e.g. one to remain at the POR (the caret, if editing), and one to move around for review purposes.
The AT will already have UI input commands for most of the above functions, but probably not for changing to higher/lower levels of granularity. If the AT needs to provide the user with an increased level of granularity, in response, the AT would call the handler to change the mode of granularity. The AT will handle the UI commands and in turn call the handler to return an item at the current level of granularity. The AT would have told the handler about the output mode, e.g. braille or TTS. Armed with those three things: level of granularity, mode of output, and which item (first, last, previous, current, next), the handler knows what to do.
In the case of editable content, the UA provides the input UI for the user. This editing capability would most likely be provided via a plugin. Specific accessibilities features needed for editing specialized markup have yet to be explored.
A common use of magnification is to proportionately enlarge content. For text-based (or more generally, font-based) applications, this means that AT software should be able to request rendering with larger sized fonts or a certain amount of magnification relative to some baseline magnification. Applications beyond standard text-based ones include math, music, and labeled plots/graphics. For non text-based applications such as graphics and chemical structures, magnification could be based on a certain percentage of the normal size or given by “fill this area”. These two ideas can always be mapped onto each other. In all of these cases, the magnification may be due to having the entire documented magnified or it may be due to a request to magnify an individual instance (such as an equation).
There are two other uses for magnification:
An expert handler should be able to provide braille data for braille display output by generic AT. Custom braille output is needed, because generic AT has no knowledge about how specific specialized data can and should be represented via braille. An example is mathematics: there are many different braille codes used to represent mathematics that vary from country to country and agency to agency.
Simple ASCII strings are normally used to communicate braille to braille devices. However, there are a lot of specific ASCII-to-dots pattern-encoding tables used to generate braille that conform to a natural language's braille conventions. Therefore AT and the expert handler have to negotiate the most appropriate braille table to be used. A more universal approach would be to use the special braille Unicode symbols which range from
There is also a need to have braille output tailored to various levels of granularity. For example, at a low level of granularity, the user would receive an overall description of the mathematical expression or image, while at the highest level of granularity, the user would receive a complete braille translation of the whole math expression or a list of all labeled components of the image.
Some data may need to be expressed in a more advanced tactile output format than refreshable braille. For example, graphical data would greatly benefit from being embossed on paper or a 2D braille display. Input devices, such as a touchpad or camera, which allow a user to communicate to the computer which parts of the graphic the user is interested in and needs to be tactilely displayed. Such interactive functionality should be left exclusively to the expert handler. This means that an expert handler must have an interactive mode and a way for an AT to trigger/toggle this mode on. In such a mode, an AT should also provide a way for the expert handler to produce more than one output stream – such as simultaneous speech and braille output – directly via an AT device which uses the same TTS engine and/or braille display.
The user must have a means of obtaining all available information about the object/character with focus, beginning with the repetition of the character or the programmatic binding which describes the object with focus. The ability to query the AT to determine one's point of regard within a document and within containers in the document is essential. The user must be able to obtain information about the current point of regard at from most generic level – what percentage of the document or section has been read, how much of the document or section remains to be read – to the most atomic. Therefore, an AT must create a User Interface where successive “Where Am I?” queries by the user generate more verbose or more terse responses. footnote 2]
A user may find it necessary to consult a “Document Summary”, containing a list of the types of elements and containers in the document. The user needs to know the document title and language as well as the number of tables, links, headings, frames, forms, controls, items, images, and pages. The application may implement a document summary feature natively through its own UI instead of an accessibility API, but in the case of specialized markup, may need the assistance of an expert handler in order to present an appropriate document summary for the content being summarized.
The goal of the Expert Handlers working group is to define a standard so that AT software can call on expert software to interpret specialized markup. One issue that needs to be addressed is how and where (in the flow of control of reading a page) should the expert handler get invoked. Here are three possibilities:
Although similar, the later two cases probably have implications on the difficulty of implementation and the capabilities of the interface. Some of these are:
Note 1. for example, the
LABEL grouping and labelling mechanisms for
FORM controls or the
id relationship defined for
TABLE in HTML 4.01/XHTML 1.0 or the ARIA markup “
<a href=“http://www.w3.org/TR/aria-roles/#labelledby”>labelledby</a>” and “
Note 2. For Specialized MarkUp Languages, the following list of points of regard needs to be broadened and abstracted into a context meaningful to the content and structure achieved through the use of a particular specialized markup language. For example, a musical score marked up in an XML-derived dialect, would frame its points of reference in a manner conformant with the structure of the content being accessed: by stanza, by bar, by note, and so on. The level of granularity necessary to provide meaningful interaction between the user of an AT and a specific markup language is highly dependent upon the type of specialized content being described, as well as the parameters and structures inherent to the specialized knowledge domain for which the specialized markup language has been designed.
For each potential point of regard possible in a specific Generalized Markup Language, the AT requires, and can usually obtain from the document's structure and semantics, as reflected in the DOM, the following element characteristics, if they exist, depending on the type of elements in the item at the current POR:
OPTGROUPin HTML) if in a group
altin HTML or