Nikita the Spider

By The Numbers

In the spring and fall of 2008 I wrote a couple of articles discussing statistics that summarize the sites that Nikita had recently crawled. The statistics cover the frequency of common validation errors, encodings and how they're specified, common doctypes frequency and media types.

The latter article expands coverage a little to include validation mode, unreadable pages, missing titles and HTTP header problems. It also uses a larger and differently constructed sample.