Discussion on Profile Structure

From wiki.gpii
Jump to: navigation, search

This page tracks some of the work of Needs and Preferences Working Group.

Proposed key points for discussion

  • Our work should support the use of multiple ontologies that can be applied to the COMMON ATOMIC TERMS to provide different views/organizations/relationship maps to meet different needs.

Topic Block: Registry

Agreed # Proposed key point for discussion
approved 1 Registry structure. (Discussion, including list of registry entries) Note: The most recent description of the record format for registry terms can be found on the REGISTRY page.
approved 7 The meaning of a value instance in a user preference profile is defined by a property definition in the Registry. (Discussion)
approved 18 Both core and non-core properties are stored in the registry database. (See use of "isCore" in Registry structure.) (Discussion - empty)
approved 28 Language-specific labels are provided for property identifiers, enumerated values in the values space, description and notes (see issue #1). (Discussion - empty)
approved 3 Property names are URIs, values are of any type that can be stored as a string. (Discussion - empty)
approved 9 Does a key definition consist of its URI and its type and value space? (see issue #1) (Discussion)
approved
5 There are two categories in the registry: core and non-core. (Discussion - empty)
approved
6 The registry for COMMON Property terms is called "COMMON Property Registry". The registry for application-specific property terms is called "Application-Specific Property Registry". (Discussion - empty)
approved 21 A part of ISO/IEC 24751 should define the data model of the registry. (Discussion - empty)
approved 44
One class of items in the registry are called properties.
approved 47
COMMON properties are defined in the namespace http://gpii.net/ns, APPLICATION-SPECIFIC properties must use other namespaces (e.g. http://microsoft.com/ns/, urn:upnp-schemas:something).
approved 48
A property name is a URL which points to the definition of the property (definition is specified in issue #1).

Topic Block: Registry Maintenance

Agreed # Proposed key point for discussion
approved 19
The registry is hosted by Raising the Floor International (for now under reg.gpii.net). It is intended to be referenced by ISO/IEC 24751. (Discussion - empty)
approved 25 Anyone can request to add a property.  A maintenance team will decide whether it is CORE or NON-CORE, COMMON or APPLICATION-SPECIFIC.  (Discussion)
approved 22 A part of ISO/IEC 24751 should define the process of adopting properties into the registry on a regular basis. (Discussion)
approved 20 We should have a web-based user interface and API (e.g. RESTful) interface to provide access to the COMMON registry for the public.  Use cases: Browse, request addition, request modification. (Discussion - empty)
approved
29 Property definitions can only be "versioned" by giving them a new URI (namespace or local identifier). (Discussion)
approved
14

Where other standards provide useful key definitions, we can adopt their meaning, and provide missing parts in our namespace, and reference them.  If they are identical, we produce a remote alias for the term. Discussion - empty)


15 Will we develop common condition terms (e.g. device and usage context)?  (Discussion - empty)
approved 16 The core properties should be the properties that have been accepted (as definitions) by a maintenance team to be described by the proposed ISO 24751 process.  All other COMMON properties are NON-CORE. (Discussion - empty)
outdated 17 Are the non-core properties the key definitions that have been provided by vendors, user groups, other standards groups, or any third parties? (outdated by resolution on topic 16)
43 What is the value space of the 'status' field for a property specification in the COMMON property registry? (cf. issue #1)? (Discussion - action item)
approved 45
Services access the registry for the following use cases: retrieve definition, add definition, modify definition (only interally, not for public), with history.

46
Do we allow duplicate core properties for the same measure in different units, e.g. inches vs. millimeters?  Or are these just different conditions? (Discussion)
approved 48
COMMON properties have their definitions in the COMMON REGISTRY.  APPLICATION-SPECIFIC properties have their definitions in an APPLICATION-SPECIFIC REGISTRY.  APPLICATION-SPECIFIC REGISTRIES can be hosted by GPII or in any other location the developer wishes.
approved 49
APPLICATION-SPECIFIC properties should be defined in a registry other than the COMMON REGISTRY.

Topic Block: Preference Profiles (Instances)

Agreed # Proposed key point for discussion
approved 10 In a user preference profile instance, the value of those keys that are not present, are unknown (incomplete profile). (Discussion)
approved 11 In a user preference profile instance, one key may occur multiple times, but only with different values. (Discussion)
2 Are the user profiles themselves all flat or are they layered? (Discussion)
12 Do we need tags attached to keys, for example to link eMail addresses to users or specify languages? (Meaning unclear.) (Discussion)

Topic Block: Conflicts

There are some situations where conflicts could arise. The Profile Conflicts page shows some examples and already offers some solutions. The goal of this discussion is to identify conflict scenarios and agree on a solution. There should be two major types of conflicts: Conflicts which arise from a user having settings for multiple devices and switching between them and conflicts which arise due to settings relying on contextual information, like the time of the day.

It should be pretty safe to say that the target devices - even including software influence like the OS version - don't change very often and if they change, we can treat them as a new device. There are two options to deal with device specific settings:

  • Make the setting keys depending on the target device. This way we will end up with multiple entries per key per profile, like .../windows/volume and .../android/volume

Context specific settings change very often, as many environmental influences are not discrete, for example the time of the day. We will probably not be able to avoid to store conditions along with each setting to deal with this semantically complex relations. Some examples can be found at the Profile Conflicts page.

  • Conditions should be JavaScript expressions evaluating to a boolean value
Agreed # Proposed key point for discussion
done 27 OBE: How do we deal with conflicts in profiles? (Discussion)
8 How to handle items that are duplicates: Identical meaning but a different name? (Discussion)
35 A property-value pair may have a priority attached? (Discussion)
Conflicts from Multiple Devices:
30 A device-specific property value is attached to a device-specific property name? (Discussion)
31 An application-specific property value is attached to an application-specific property name? (Discussion)
32 A platform/OS specific property value is attached to a platform/OS specific property name? (Discussion)
Conflicts from Context based Conditions:
33 Is a property value that is specific to a context aspect expressed via a condition attached to a property-value pair? (Discussion)
34 What is a condition? (Discussion)

Topic Block: Ontologies and Views

An overview of proposed integration of property URLs and ontologies can be found at http://wiki.fluidproject.org/display/ISO24751/User+Profile+Illustration

Agreed # Proposed key point for discussion
4 Should we have URLs rather than the more general URI? (Discussion)
36 Who is responsible for filling in the ontology information when registering a new property name? (Discussion - empty)
23 Should views may be specified on top of the key-value pairs, each defining a particular structure/ontology for the key-value pairs? (Discussion - empty)
24 Are views specified externally to the registry (e.g. as RDF/OWL files on a Web server)? (Discussion - empty)

Topic Block: Inferred Preferences

A matchmaker may infer a property-value pair. Inferred preferences have not been set by the user nor are confirmed by the user.

Agreed # Proposed key point for discussion
38 Inferred preferences are matchmaker-specific. (Discussion - empty)
41 We don't need to specify probability values for inferred properties (cf. #38). (Discussion - empty)
42 An inferred preference can become a regular preference through user confirmation. (Discussion - empty)
39 Inferred preferences are not stored in the user profile. (Discussion - empty)
40 Inferred preferences are not stored in the profile server, but may be stored in a matchmaker-specific database. (Discussion - empty)
under discussion 13 OBE: In a user preference profile instance, each entry (property-value pair) does not have a probability assigned (cf. #38). (Discussion)
37 OBE (because matchmaker-specific and out of scope for standard): A probability expresses the certainty of an inference made by a specific matchmaker. It is a value in the range [0; 1]. The default value is 1. (Discussion)

Other Decisions

Language Codes

One of the terms in the current version of the Registry is language (description: "a preference for the language of the user interface"). The value space is tentatively defined as the values defined by ISO 639-2/T. ISO 639-2/T identifies languages by means of three-letter codes (instead of the ISO 639-1 two-letter codes that are commonly used in HTML pages) without a means of identifying variants (see also the list of ISO 639-2 codes on Wikipedia).

Proposal

Use IETF BCP 47 instead of ISO 639-2/T as the format for identifying languages.

  • BCP 47 defines a language tag is consisting of a primary language subtag, followed by several optional subtags (especially for script, region and/or variant).
    • Scripts can be identified by means of codes defined by ISO 15924:2004. For example, zh-Hans and zh-Hant have sometimes been used to distinguish between Chinese with Simplified Characters and with Traditional Characters, respectively. The registration authority for ISO 15924 tags is the Unicode Consortium; see Codes for the representation of names of scripts.
    • Regions, including countries, can be identified by means of codes defined by ISO 3166-1. An ISO 3166-1 decoding table is available on the ISO website. The list of alpha-2 country codes (in TXT, HTML or XML) is available free of charge for internal use and non-commercial purposes. The full ISO 3166-1:2006, which also contains the alpha-3 codes and the numeric codes, is not available free of charge.
  • BCP 47 allows the use of three-letter codes for primary language tags defined by ISO 639-3. The registration authority for ISO 639-3 tags is SIL International; see ISO 639-3 Registration Authority. Using ISO 639-3 has several advantages:
    • This list is more complete than ISO 639-1 and ISO 639-2.
    • ISO 639-3 provides more precision for the identification of languages: some of the ISO 639-1 codes actually referred to macrolanguages, for example zh (Chinese) and ar (Arabic). The ISO 639-3 list distinguishes between macrolanguages and sublanguages, for example zho (Chinese) has sublanguages such as cmn (Mandarin), hak (Hakka) and yue (Yue or Cantonese). These distinctions can trigger different Braille conversion tables or text-to-speech engines (e.g. Ekho supports Cantonese, Mandarin and Zhaoan Hakka), so these distinctions are relevant to accessibility. See the ISO 639-3 Macrolanguage Mappings.
    • Three letter codes also allow us to identify sign languages. ISO 639-2 contains the tag "sgn" for sign language (which would need to be refined with subtags), and ISO 639-3 contains tags for individual sign languages, such as ase (American Sign Language), asf (Australian Sign Language) and sgg (Swiss-German Sign Language). ISO 639-1, by contrast, contained no tags to identify sign languages.
  • BCP 47 is also the standard for values of lang and xml:lang in HTML5.
  • ISO standards can use IETF RFCs and BCPs as normative references.
  • General rule: use the shortest code if both a two-letter code (ISO 639-1) and a three letter code (ISO 639-2 or ISO 639-3) exist.
    • BCP 47 states: "Each encompassed language's subtag SHOULD be used as the primary language subtag. For example, a document in Mandarin Chinese would be tagged "cmn" (the subtag for Mandarin Chinese) in preference to "zh" (Chinese)'
      The implications of using three-letter codes (for "sublanguages") instead of two-letter codes (for "macrolanguages") may need some investigation. For example, the Ekho TTS engine may support cmn, hak and yue, but it is not clear if Orca and NVDA can handle three-letter codes. (I.e. a matchmaker or a transformer may need to "translate" a three-letter code into a two-letter code for some screen readers or TTS engines.)
      BCP 47 states: 'If compatibility is desired or needed, the encompassed subtag MAY be used as an extended language subtag. For example, a document in Mandarin Chinese could be tagged "zh-cmn" instead of either "cmn" or "zh".'
    • ISO 639-1 has no codes for sign languages. Both ISO 639-2 and ISO 639-3 enable the identification of sign languages. ISO 639-2 uses 'sgn', which is meant to be supplemented with a region subtag, e.g. 'sgn-US' for American sign language. ISO 639-3 has separate language codes for individual sign languages, e.g. 'ase' for American sign language. According to Michael Everson the ISO 639-2 codes were "to be deprecated in favour of ISO 639-3 codes". According to BCP 47, 'sgn-US' is still valid but deprecated; 'ase' or 'sgn-ase' is preferred.

Notes

  • While the set of languages supported by assistive technologies is only a very small subset of the (over 5000) living languages, it is also important to support the matching of resources in specific languages (including subtitles, captions, etc) with languages that a user understands, and this is probably a much wider range than what is supported by AT.
  • Implementations would need to synchronise their list of languages with the list maintained by SIL International (the registration authority for ISO 639-3), since language tags may be retired (see the Retired ISO 639-3 Codes).
  • Implementations would need to synchronise their list of country codes with the list maintained the ISO 3166 Maintenance Authority, since country codes may be added or withdrawn (e.g. the country code for Yugoslavia was withdrawn).
  • There are a few special language codes:
    • Content in an undetermined language can be tagged with 'und' (ISO 639-2 and ISO 639-3). BCP 47 points out that this tag should only be used if a language tag is required.
    • Content in an uncoded language can be tagged with 'mis' (ISO 639-2 and ISO 639-3), i.e. the language is known but has no language code.
    • Non-linguistic content can be tagged with 'zxx' (ISO 639-2 and ISO 639-3), i.e. sound recordings with only nonverbal sounds, instrumental music, programming source code.
    • Content in multiple languages can be tagged with 'mul' (ISO 639-2 and ISO 639-3). BCP 47 points out that this tag "SHOULD NOT be used when a list of languages or individual tags for each content element can be used instead".
  • There is no "default country code" for languages, so if content is tagged with only "eng" (English), there is insufficient information to decide, for example, whether an American, Canadian, British or Australian Braille translation table should be used.
  • The language tags described in IETF BCP 47 "are sequences of characters from the US-ASCII [ISO646] repertoire". (This does not prohibit the use of language tags in UTF-8 content. As Wikipedia points out: "The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.")

Richard Ishida's article Language tags in HTML and XML also contains a high-level overview of IETF BCP 47.

Result

A last call for comments was sent to the GPII Architecture and the AccessForAll mailing lists on Thursday 11 October, with a request to raise objections or send comments by 15 October; otherwise, the proposal would be considered as accepted by common consent. No comments or rejections were sent by 15 October, so the proposal now reflects common consent.

(Prior to this request, a last call for comments was sent to the GPII Architecture and the AccessForAll mailing lists on 5 October.)


Decision: Use IETF BCP 47 instead of ISO 639-2/T as the format for identifying languages.
Date : 2012 Oct  23
By Whom  Needs and Preferences Working Group   (GPII Architecture and AccessForAll working groups ) (See process just above)
Status: Pending for 30 days til Nov 23 2012



Format of Registry Names

Issue

  • Many of the names in the Registry for Property and Condition Terms contain spaces, e.g. "adaptation type", "background colour" and "colour coding avoidance". These names are what we call the "Local Unique IDs".
  • The example preference sets on GitHub use camel case for these names, e.g. screenEnhancement, foregroundColor, etc.
  • We wanted to use URIs as property names (see item 3 on the page Discussion on Profile Structure: "Property names are URIs, values are of any type that can be stored as a string").
  • In URIs, spaces are special characters that need to be percent-encoded; see RFC 3986: "Uniform Resource Identifier (URI): Generic Syntax". For example, a space would be encoded as %20.

Question: Should we keep the spaces in the names and percent-encode them in URIs, or should we rename our Local Unique IDs in camel case to avoid percent-encoding of space characters? (Or is the Registry going to provide an API that does automatic conversion between camel case and the notation with spaces?)

Discussion

  1. We can't impose a naming convention, since we're going to accept terms from vendors, independent organisations etc. The API is an interesting idea, especially for avoiding the same term being defined several times with the very same meaning (e.g.: "foreground colour" && "foreground-colour" && "foregroundColour") and that's something that the transformer could handle. (Andres)
    • Note: Can the aliases solve the issue of "synonyms"?
  2. The semantic framework ontology which will be the backbone of the solutions registry as well as the API and the alignment tool that will be implemented for making the "interface" between the ontology with the solutions registry is going to "solve" the aforementioned same meanings. (Kostas)
    • Note: Shouldn't it be possible to resolve synonyms without recourse to ontologies, i.e. by querying the Registry directly?
  3. It does not matter very much what convention people use, since the unique identifier of the object being named will always be a URI. (Liddy)
  4. It should be acceptable to impose some minimal naming conventions to make it easier for programmers in a variety of languages to use these local names without extra effort. It doesn't seem inappropriately restrictive to simply ask contributors to the registry to avoid certain characters (such as spaces) when specifying their local names. Validation in the Registry's UI can prompt the user, and if we find a desire for more consistency beyond that, we can come up with a style guide for contributors. (Colin)
    • For the time being we shouldn't suggest including this naming convention as part of ISO 24751. Let's work it out, try it in practice for a while, and then suggest it if we find it's a successful approach. (Colin)
  5. Some prefer the underscore convention, others the camel case convention, which is also used in Dublin Core.
  6. The naming convention is not only about underscores versus camel case, but also about variant spellings like "color" (US English) versus "colour" (British English, Canadian English, Australian English).
  7. Underscore creates problems when text is underlined (or in a link which auto underlines) in the the underlines in the name are not visible.   CamelCase resolves this. (Gregg)
  8. When speaking of a preference being a URI I presume that the first part of the URI would be the address of the registry being used, and the last part would be the term in the registry?  (Gregg)
  9. Aliases would appear to resolve the problem of people using different forms (re. note # 1 and 2).  Couldn't aliases also be used to resolve any spelling differences? (e.g. backgroundColour and backgroundColor. )   (I think we should keep things simple)  (Gregg)
  10. In ontologies, camel case is the general convention: class names start with an uppercase letter; property names start with a lowercase letter. See also the sentence: "This statement uses a common convention that class names are written with an initial uppercase letter, while property and instance names are written with an initial lowercase letter. However, this convention is not required in RDF Schema." in the W3C RDF Primer (2004).

Tentative Decision to try out for the reference or CORE terms

We stated that we would trying things for awhile to see what works.  What will we be trying?  What will we try out as the cononical form for terms that we introduce?   camelCase?   Other?

Descriptive Title for Decision: Name Format for Terms in Registry     (not deciding til we try things out for awhile)
Date : 2012-10-23
By Whom  Needs and Preferences Working Group   (GPII Architecture and AccessForAll working groups ) (See process just above)
Status: No Final Decision. No decision yet on what to try.

Current Practice for Common Terms

During the first pilot phase in Cloud4all, the following conventions were used for common terms:

  • All common terms start with the URI http://registry.gpii.org/common/.
  • The part after the URI follows one of the following conventions:
    • "flat" term, e.g. magnifierEnabled
    • "hierarchical" term, e.g. display.screenEnhancement.magnification
    • provisional term, e.g. display.screenEnhancement.-provisional-magnifierPosition

These conventions are reflected in the page Cloud4all Testing: Essential Registry Terms (version of 25 October 2013).

The flat terms are converted to hierarchical terms by a "transformer". See also JIRA ticket GPII-336.

Glossary

See Also