Re: The Reference Concrete Syntax is not Current Practice (Was Re:

Glenn Adams (glenn@stonehand.com)
Tue, 26 Sep 95 09:10:15 EDT
Date: Mon, 25 Sep 95 23:37:27 EDT
From: Arjun Ray <aray@pipeline.com>

SGML takes no prisoners: parsing and validation are identical concepts.
Scream-and-die when something doesn't validate is the SGML way.

It may be the way of certain implementations, but it certainly need not be
case. Given that error recovery is not mentioned let alone standardized by
ISO 8879, an implementation has all the leeway it wishes to implement error
recovery. Of course, problems lurk here too, since certain lexical errors
may not be noticed at all in real SGML, e.g.:

<A HREF="foo.htm>click here</A> and <A HREF=bar.htm">click there</A>

while this might be noticed by a non-compliant parser, e.g., one that
terminates a tag with &#62; ('>') irrespective of whether it appears in
an attribute value literal.

The fragility of the Concrete Syntax reflects this, as does the
unwillingness to distinguish lexical tokenization from content model
enforcement as essentially *different* meanings of "parsing".

In what way is the CS more fragile than other formal languages, e.g., C++,
LISP, etc? How should/could lexical and syntactic levels be distinguished
in ISO 8879? Isn't this distinction simply an aspect of an implementation
rather than an aspect of the language? How does a language specification
like C++ distinguish this difference?

Regards,
Glenn Adams