HTML and XHTML look pretty similar from the point of view of a Web page author, but they have roots in different markup languages -- SGML and XML respectively. SGML and XML are different languages with different rules and are therefore validated differently. (W3.org has available a very detailed description of the differences between SGML and XML written by James Clark. Mr. Clark is also the author of OpenSP which is the validation software at the heart of both Nikita and the W3C Validator.)
Since SGML and XML are different, it's important that a validator like Nikita uses the correct validation mode for your documents. On the modern Web, it's not always easy to decide which mode a document uses because the text/html media type (HTTP Content-type header) is used to deliver both HTML (SGML) and XHTML (XML).
There are many arguments against sending XHTML labelled as text/html but they're beyond the scope of this article. Instead of taking a bold stand on the handling of XML sent as text/html, Nikita simply echoes the behavior of the W3C validator when it comes to choosing a validation mode since that's the behavior people have learned to expect from a validator.
There are three factors that can influence which validation mode Nikita uses -- the media type (sometimes also called MIME type and content type), the doctype, and whether or not the document contains an XML declaration. The algorithm that Nikita uses looks like this:
/* Default mode is SGML. */ ValidationMode = SGML if the media type is XMLish [1] ValidationMode = XML else /* implied ==> media type is text/html */ Sniff the doctype if doctype is known [2] ValidationMode = mode implied by the doctype [3] else /* implied ==> doctype is not present or is unknown */ if an XML declaration is present ValidationMode = XML
Footnotes for the code above:
Media Type | Doctype | XML Decl Present | Validation Mode | Warnings Issued by Nikita | |
---|---|---|---|---|---|
Footnotes 1. For documents sent with a media type of text/html and an XHTML 1.0 doctype, Nikita won't issue a warning unless you specifically ask her to do so. XHTML 1.1 documents sent as text/html always generate a warning. | |||||
01 | text/html | HTML | Yes | SGML | None |
02 | text/html | HTML | No | SGML | None |
03 | text/html | XHTML | Yes | XML | None, or media type/doctype conflict(1) |
04 | text/html | XHTML | No | XML | None, or media type/doctype conflict(1) |
05 | text/html | Unknown | Yes | XML | Missing doctype |
06 | text/html | Unknown | No | SGML | Missing doctype |
07 | XMLish | HTML | Yes | XML | Media type/doctype conflict |
08 | XMLish | HTML | No | XML | Media type/doctype conflict |
09 | XMLish | XHTML | Yes | XML | None |
10 | XMLish | XHTML | No | XML | None |
11 | XMLish | Unknown | Yes | XML | Missing doctype |
12 | XMLish | Unknown | No | XML | Missing doctype |