[slightly edited]

From: Arjun Ray <aray@nyct.net>
To: xml-dev@lists.xml.org
Date: Wed, 18 Sep 2002 07:45:05 +0000
Message-ID: <3d2gou4orr2sc3perbt2o5vos9bhf7igfh@4ax.com>
Subject: [xml-dev] XMap: A Mechanism for Mapping Names


Eventually this will become a paper with all the gory details, but here is
a preview of a markup scheme which I believe

  a.  constitutes an alternate non-invasive syntax for XML Namespaces.
  b.  allows annotation of values in multiple taxonomies simultaneously.
  c.  enables name mapping in the style of Architectural Forms.

1. Basics.

XMap is a method to record the details of name (re)mapping in the values
of specially defined extra attributes.  

These values are lists of tokens, most of which are names of attributes;
the exceptions are keywords with defined semantics.  The basic idea is to
associate names in one taxonomy with names in another by a list of pairs
(e1 l1 e2 l2 ...), where external name e1 is associated with local name
l1, external name e2 with l2, and so on.  The general interpretation is
that an attribute specification l1="v" in the local taxonomy - i.e. the
instance markup - would be (circumstantially) the same as an attribute
specification e1="v" in the external taxonomy.

This list will be the value of a _mapping control attribute_ (to nominate
it in true SGML longwinded style), whose local name is not constrained: it
can be chosen to avoid conflicts. 

The specification of this local name will be in another list of pairs
comprising the value of a _mapping support attribute_.  The name of this
support attribute is constrained because it must be recognized by generic
software: it is not tied to any application semantic (i.e. would not
require understanding of the application markup and their meanings.)  A
reserved name in XML is easy: using the 'xml' prefix.  Hence the name
'xmlmap' for the mapping support attribute.  (In SGML, even this name
would have to be declared, probably through a processing instruction in
the prolog; this technique could work for XML too.)

The pairs in the value of the mapping support attribute (xmlmap) associate
a _distinguished name_ with the local name of a mapping control attribute.
Several interpretations of this distinguished name are possible, such as,
inter alia: the name of a local attribute whose value is a Namespace URI,
the name of an SGML architecture, or even the name of a declared notation.

XMap has rules to determine which interpretation applies.  The realities
of the XML world dictate that the default interpretation, requiring no
explicit recording, is that of "XML Namespaces".

An example, somewhat deliberately obfuscated in order to bring out the
generality, using categories from the HTML taxonomy:

      <img src="some.gif"
           xmlmap="zz yy"
           zz="http://www.w3.org/1999/xlink"
           yy="href src type foo role bar title quux"
           foo="simple"
           bar="http://example.org/some/role"
           quux="Sleeping Kitty"
           >

The support attribute xmlmap identifies zz as the name of a namespace
attribute, and yy as the corresponding control attribute to translate
local names to names in the namespace identified by zz.  The effect of
this markup is to identify four "translated" attribute specifications,

           href="some.gif"
           type="simple"
           role="http://example.org/some/role"
           title="Sleeping Kitty"

understood in the light of the namespace specification,

           zz="http://www.w3.org/1999/xlink"           

Had all the external names except 'href' been used as-is locally, the
control attribute would have been:

           yy="href src type type role role title title"

The redundancy can be obviated by the keyword ":auto", which is paired
with the name of a local attribute, whose value is a list of the names
that are taken as-is in the translation.  Thus, the markup could be

      <img src="some.gif"
           xmlmap="zz yy"
           zz="http://www.w3.org/1999/xlink"
           yy="href src :auto xx"
           xx="type role title"
           type="simple"
           role="http://example.org/some/role"
           title="Sleeping Kitty"
           >

2.  "Unnamed" Attributes.

Generic identifiers (which are really unnamed attributes) are mapped by
the keyword ':gi'.  By default, generic identifiers are not mapped.
Elements in SGML/XML have another unnamed attribute besides the GI: the
data content of the element.  The keyword ':data' identifies this.  (This 
is of use to AF wonks; it's of no relevance to XML Namespaces.)  Mapping 
an external GI has the following possibilities

  :gi foo    - The external GI is the value of the local foo attribute.
  :gi :gi    - The external GI is the same as the local GI.
  :gi :data  - The external GI is the text content of the local element.
  :gi :null  - The external GI is some category known to the external
               taxonomy (this is the default).


3.  Instance markup and DTDs/Schemas

XMap is based on instance markup because the association with an external
taxonomy could be circumstantial.  When it isn't - i.e. the association is
systematic for a local element of a given type - the instance markup can
be economised by appropriate ATTLIST declarations in the DTD (often in the
internal subset.)  E.g.

  <!ATLLIST  img
             xmlmap   NMTOKENS   #FIXED  "xlink xlmap"
             xlink    CDATA      #FIXED  "http://www.w3.org/1999/xlink"
             xlmap    NMTOKENS   #FIXED  "href src :auto xlauto"
             xlauto   NMTOKENS   #FIXED  "type role title"
             >

      <img src="some.gif"
           type="simple"
           role="http://example.org/some/role"
           title="Sleeping Kitty"
           >

4.  Some Topical Notes.

One important feature of XMap is that all mappable names are in attribute
*values*, where there is no ambiguity when the semantic of the attribute
name (a control, typically) is known.  Thus, compound names with internal
syntax are not necessary: documents can be parsed with simple APIs such as
SAX1.  This is a simplicity win, for the cost of usually at most two extra
attributes.

There is also the feature that data values can be shared by authorial
intent (as opposed to relying on heuristics such as string equivalence for
potentially repeated values) by simply entering the value once and mapping
all other names.  The xlink:href versus href/src problem simply vanishes.

Perhaps more topically, XMap covers all the notational requirements of XML
Namespaces, rendering colonification unnecessary, and permits the same
defalting rules for "applicable namespaces" (when the namespace is not
identified by URI in the starttag but merely declared by name in the
xmlmap attribute.) 

The connection with AFs (and thus the details I've left out here) probably
doesn't need comment.