Nikita the Spider Reports Help

Hot Pages

Hot pages are pages on your site that probably need your attention. Nikita considers a page hot if any of the following are true:

The page is unreadable
This happens either when Nikita is unfamiliar with the encoding specified for the page (which happens rarely) or when the page contains octets or octet combinations undefined in the specified encoding.

Browsers are very good at handling miscoded pages; Nikita is much less forgiving. She makes no attempt to decode the page with any likely alternate encoding nor does she try to ignore troublesome characters. This may be frustrating but it is consistent with Nikita's goal of helping you to achieve strict conformance to standards. If Nikita can't understand it, there's a good chance some other user agents will fail on it too.

The page specifies multiple encodings
The encodings can duplicate one another (possibly bad) or conflict (definitely bad).

Duplicate encodings (for instance, UTF-8 specified in both the HTTP Content-Type header and a META element in the document) are only a problem in that they violate the DRY (Don't Repeat Yourself) Principle – they're a first step towards conflicting encodings.

Conflicting encodings (for instance, UTF-8 specified in the HTTP Content-Type header and ISO-8859-1 specified in a META element in the document) are a sign that the page author is confused and that some user agents might not be able to read the document or render it properly.

You can read in detail about how Nikita determines a page's encoding.

The URL exceeds 72 characters
Some mail programs break URLs longer than 72 characters which means your URL will be difficult for some people to send successfully via email.
The page contains validation errors.
The doctype is missing or unknown
Warnings about unknown doctypes often result from doctype declarations that are lower case where they should not be. (Important parts of the declaration are case sensitive.) The W3C maintains a list of commonly-used doctypes.
The page's media type and doctype conflict
For instance, text/html is not an acceptable media type for XHTML 1.1 documents.

Hot Headers

Hot headers are HTTP headers sent by server which violate one of the RFCs defining HTTP. Nikita doesn't check all HTTP response headers for validity, just ones that contain common mistakes. Nikita considers a header hot if any of the following are true:

A header contains an invalid date
RFC 2616 §3.3 defines three acceptable HTTP header date formats for the headers that can contain dates ('Date', 'Expires', 'Last-Modified' and 'Retry-After'). Here's an example of each of the acceptable formats:
    Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
    Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
    Sun Nov 6 08:49:37 1994        ; ANSI C's asctime() format
    

Sending an absolute value like "0" in an Expires header is a common mistake. Although the HTTP specification specifically mentions this practice, it also states clearly that it is invalid to send "0" in the Expires header.

Absolute values are permitted in the Retry-After header, and Nikita understands that they're valid.

A Location header value is not an absolute URL
Location headers must refer to an absolute URL according to RFC 2616 §14.30. Sending a relative URL like /foo/bar.html is a common mistake.
The response code is unknown
Nikita will let you know if your server sends a response code that's not in the HTTP Status Code Registry.
A header contains invalid characters
RFC 2616 §4.2 states that only printable ASCII (33 - 126 inclusive) plus whitespace (tab, CR, LF and space) are valid in header values. Nikita considers a header hot if if contains characters outside of this range. RFC 2047 §4 explains how to properly encode characters that are out of that range.
A response lacks a Date header
RFC 2616 §14.18 states that a Date header must be included with all responses. There are a few response codes (like those in the 5xx Server Error range) that are exceptions to this rule, and Nikita takes into account those exceptions.

Statistics

URLs inspected
This a count of the unique URLs on your site which Nikita visited in an attempt to collect information. It includes URLs that returned a response code of 200 as well as 30x redirects, 404s, etc.

It does not include URLs that were visited only during link checking. Link checks don't record much besides the response code so Nikita can't count them as "inspected". There's two reasons why Nikita might have checked a URL on your site while excluding it from full inspection. The first reason is because your crawl may have been limited to a certain number of pages. Once Nikita reaches that limit, she doesn't inspect any more URLs but she will check the remaining URLs that she knows about. The second reason is because you may have asked Nikita to check parameterized URLs but not to visit (inspect) them. The options to visit and check parameterized URLs are both off by default.

Resources found
This a count of the unique URLs on your site that Nikita visited and found present (i.e. returned an HTTP status code of 200). It is a subset of URLs inspected (above).
Pages found
This a count of all the (X)HTML pages on your site. More specifically, it is a count of the unique URLs on your site that Nikita visited and which returned (a) an HTTP status code of 200 and (b) an (X)HTML media type. It is a subset of resources found (above).
Page size -- mean, median and mode
These statistics are calculated based on the sizes (byte count) of pages delivered to Nikita. The mean is an arithmetic mean expressed in bytes rounded off to the nearest integer. The list of mode page sizes can contain duplicate values due to rounding of the displayed values. In other words, the actual mode values differ but only by values of less than .01 KiB.
Hot Pages
Listed here are the total number of hot pages and the count of pages that are hot broken down by reason. You can read in detail about what Nikita considers problematic enough for inclusion in the hot pages list. Note that the sum of the percentages can exceed 100% because each page can have multiple reasons for being on the hot list.

The last four hot page statistics contain percentages based on the total number of readable pages. They refer to problems with the document content and it doesn't make sense for Nikita to report content errors for pages she can't read. If there are no unreadable pages on your site, you can ignore this statistical detail.

Hot Headers
Listed here are the total number of URLs on your site that responded with HTTP headers that violate standards (usually RFC 2616, the HTTP 1.1 specification). You can read in detail about what Nikita considers problematic enough for inclusion in the hot headers list. Note that the percentages may add up to greater than 100% because each URL can have multiple reasons for being on the hot list.
Links
A link is any reference to another resource. For instance, the href attribute of an <a> element counts as a link as does the src attribute of an <img> or <frame> element. Nikita breaks down the links she sees by scheme (the part of a URI that comes before the ":" or "://"). Note that javascript: links are not references to JavaScript files but are <a> elements coded like so: <a href='javascript:alert("boo!");'>. Links in the "Other" category are often mistakes, so watch out for these.

Nikita further divides HTTP, HTTPS and FTP links into internal and outbound based on the destination of the link. Destinations in the same domain as the seed URL are internal, all other destinations make a link outbound. Note that if the seed URL is www.example.com, a URL with a domain like news.example.com or ftp.example.com is considered outbound.

Media Types
This is a list of the media types reported to Nikita in the HTTP Content-Type header of the URLs she visited. Nikita counts the percentages shown based on the total number of resources that she found.

URLs that don't supply a media type are assigned one of application/octet-stream as per Section 7.2.1 of HTTP 1.1.

Encodings
This is a list of the primary encodings supplied for each page. If a page specifies multiple encodings, only one is listed here. Percentages are calculated based on the total number of pages. You can read in detail about how Nikita determines a page's encoding.
Doctypes
This is a list of the doctypes supplied for each page. Percentages are calculated based on the total number of readable pages.