Nikita the Spider

Nikita's Statistics Explained

Total resources found
This a count of the unique URLs on your site that Nikita visited and found present (i.e. returned an HTTP status code of 200).

Note that if you're in the unusual situation where the option to check parameterized links is on but the option to visit parameterized URLs is off, then then checked links will not be included in this count.

Total pages
This a count of all the (X)HTML pages on your site. More specifically, it is a count of the unique URLs on your site that Nikita visited and which returned (a) an HTTP status code of 200 and (b) an (X)HTML media type.
Total readable pages
This is the total page count minus the number of pages that Nikita was unable to decode using the page's explicit or implied encoding (character set). It's bad news if this is less than the total pages.

Note that browsers are very good at handling miscoded pages; Nikita is much less forgiving. She makes no attempt to decode the page with any likely alternate encoding nor does she try to ignore troublesome characters. This may be frustrating but it is consistent with Nikita's goal of helping you to achieve strict conformance to standards. If Nikita can't understand it, there's a good chance some other user agents will fail on it too.

You can read in detail about how Nikita determines a page's encoding.

Total valid pages
This is the readable page count minus the number of pages on which Nikita found validation errors.
Page Size -- Mean, Median and Mode
These statistics are all calculated based on the sizes of the readable pages. Unreadable pages are ignored. The mean is an arithmetic mean expressed in bytes rounded off to the nearest integer. The list of mode page sizes can contain duplicate values due to rounding of the displayed values. In other words, the actual mode values differ but only by values of less than .01 KiB.
Links
A link is any reference to another resource. For instance, the href attribute of an <a> element counts as a link as does the src attribute of an <img> or <frame> element. Nikita breaks down the links she sees by scheme (the part of a URI that comes before the ":" or "://"). Note that javascript: links are not references to JavaScript files but are <a> elements coded like so: <a href='javascript:alert("boo!");'>. Links in the "Other" category are often mistakes, so watch out for these.

Nikita further divides HTTP, HTTPS and FTP links into internal and outbound based on the destination of the link. Destinations in the same domain as the seed URL are internal, all other destinations make a link outbound. Note that if the seed URL is www.example.com, a URL with a domain like news.example.com or ftp.example.com is considered outbound.

Media Types
This is a list of the media types reported to Nikita in the HTTP Content-Type header of the URLs she visited. Nikita counts the percentages shown based on the total number of resources that she found.

URLs that don't supply a media type are assigned one of application/octet-stream as per Section 7.2.1 of HTTP 1.1.

Encodings
This is a list of the primary encodings supplied for each page. If a page specifies multiple encodings, only one is listed here. Percentages are calculated based on the total number of pages. You can read in detail about how Nikita determines a page's encoding.
Doctypes
This is a list of the doctypes supplied for each page. Percentages are calculated based on the total number of readable pages.