From: Arjun Ray Newsgroups: comp.text.sgml Subject: LPD Before DTD in the Prolog Date: Wed, 13 Dec 2000 01:39:34 -0500 Organization: FUDGE Dispersal Systems Message-ID: SUMMARY: Having re-read - for the nth time! - all the stuff on LINK in the Handbook, such as the Nutshell tutorial and the extensive commentary on Clause 12, I'm convinced that the provisions in ISO 8879 as they stand are a misspecification. Not only is LINK problematic (e.g. all the hoop-jumping with the order of applicable entity declarations), it is actually impossible to implement consistently in the general case. All of the difficulties stem from the requirement that LPDs follow DTDs in the prolog; just about all of them can be solved by reversing this order, i.e. require that LPDs precede DTDs. A new keyword in the FEATURES section of the SGML declaration could differentiate the new style of prolog order from the old. 1. LPDs after DTDs are inconvenient for entity management/storage. It's normal to store (sub)documents in the form of doctype + instance. But when LPDs are in play, we can't do this because the LPDs go in the middle. And, since usually there are choices in LPDs, we're forced to store instances as text entities only and juggle prologs in separate "driver" files. This entire rigamarole would be obviated if LPDs were placed in front of DTDs rather than after them. 2. Byzantine rules for the order of applicable entity declarations. This is actually a clash between two design objectives. On the one hand, a LPD should have the ability to preempt entity declarations in a DTD for its own processing run. On the other, it should be possible to "re-use" parameter entities declared in the DTD if only to get a better handle on consistency. (For instance, exploiting a PE that bundles a whole bunch of associated element types.) Of these two desirable features, the second is actually only a convenience - since both the LPD and the DTD could just as well transclude an external set of parameter entity declarations - while the first is crucial. But if entity declarations in the LPD are to be given priority, it makes no sense to have them appear after the DTD in a linear forward scan of the document. This is a reversal of the "first declaration counts" rule, and counter-intuitive in the extreme. Even worse, a DTD can't be parsed until the LPD that comes after it has been parsed - this is because PEs (which *may* depend effectively on what the LPD says) make the structure of the DTD opaque enough to preclude any profitable tokenization. All this needless buffering and to-ing and fro-ing - not to mention the elaborate explanations in the Handbook about the alleged difference between "interpretation" and "parsing" (I just couldn't buy it, sorry, it looked too bizarre) - would be obviated if the *natural* order of the applicable entity declarations were reflected simply and intuitively: LPD preceding DTD. 3. The "loss" from the proposed new order, of course, is that PEs in an LPD could no longer refer to declarations in the DTD (as this would constitute use before declaration). But actually, this is not a loss at all, it's only the elimination of a misfeature that *creates* the possibility of a contradiction, a situation like the "all cretans are liars" self-referential paradox. Proof: ]]> ]> ]]> ]> blah Any assumption about the replacement text of %foo in the LPD will be falsified, and as a result %bar will be indeterminate too. FWIW, even nsgmls throws its hands up, with this: : nsgmls:foo.sgm:27:0:E: definition of parameter entity "bar" is unstable : nsgmls:foo.sgm:27:0:E: definition of general entity "color" is unstable : ACOLOR CDATA green : (EXP : -blah : )EXP "Unstable" clearly implies an "implementation-defined" resolution of a logical contradiction. Since the specification in ISO 8879 is internally inconsistent, there is no reason to let it stand as is, except for the fact that legacy usage of the existing order can't be declared nonconforming. OTOH, the correct order (with an implied repeal of 12.1.4.1:5-7 on PE usage) would facilitate ease of use and implementation. The solution seems to be an option in the SGML declaration to tell the two orders apart, such as a new keyword like 'OLDORDER' to follow 'EXPLICIT' in [197], with values 'NO' and 'YES' (YES being the default to reconstruct ISO 8879 for those who really need it that way.) A lot can be said *against* something that makes matters considerably more onerous for implementors without benefiting end-users. LINK is useful - it's a shame that a mistake in ISO 8879 actively militates against its deployment. Can we get this fixed? Please? -- :ar