Nikita the Spider

FAQ

  1. What does this service do?
    Nikita is a Web site quality checker. She checks your Web site's pages for valid (X)HTML, broken links, malformed HTTP headers, encoding conflicts and more. Here's a very thorough list of Nikita's features.
  2. Who is this service for?
    This service is of most use to Webmasters and SEO professionals. It is especially helpful for Webmasters of sites with multiple authors or automatically generated content. Nikita stays on the lookout for entropy's little helpers – link rot, tag soup, encoding inconsistencies and the like – and lets you know about them in concise, complete reports.
  3. Yes, if possible. Nikita can only crawl a certain number of sites at one time, so if she's already running at full capacity you might have to wait for a while before she can start crawling yours. Also note that if she's already crawling a site, she needs to complete that crawl before she can start another on the same site.
  4. It depends on the size of your site. As a very rough estimate, expect it to take 20 to 30 seconds per page. If this sounds slow to you, please note two things. First, Nikita spends most of her time waiting between requests so as not to overload your server. This wait is governed by the "politeness delay" option which defaults to five seconds.

    Second, the time it takes to spider your site also depends on the number of elements that each page refers to (like images, CSS files, PDFs, scripts, etc.). Nikita has to investigate every item referenced by your site in order to report accurately about it. There might be more of these than you realize, which means that it might take Nikita longer than you expect to crawl your site.

    While Nikita is working, she gives you an estimate of the time remaining. That estimate reflects only what Nikita knows about your site at that point. It might grow as she discovers more of your site.

    The "elapsed time" on the completed statistics report tells you exactly how long Nikita spent spidering your site.

  5. How much does this service cost?
    Two cents ($US .02) per page that Nikita finds, with the first 125 pages absolutely free. You might find it useful to look at some precalculated sample costs.
  6. Creating an account with Nikita is free and gives you a few benefits. First, Nikita will remember which sites you've asked her to crawl. You'll get a list of them when you log in.

    You can also ask Nikita to follow up on a crawl that she's done for you before. (You can read more about followup crawls just below.) In addition, as a registered user, you won't have to type in your email address when you start a crawl.

  7. The largest site that Nikita has spidered was over 200,000 pages. Nikita is capable of quite a lot, but you should also consider if you will be able to process all of the information she generates.

    If you think your site will have few validation errors, encoding problems, broken links and so forth, then don't hesitate to ask Nikita to crawl it all at once. If you have a large site and you expect Nikita to have a lot to say about it, consider asking her to cover it in multiple crawls. Supplying a path in the seed URL is a simple, effective way of dividing your site into chunks.

  8. A followup crawl is when Nikita recrawls a site that she's already crawled for you. This allows Nikita to narrow the focus of her reports so that she only reports on pages changed since her previous visit. Followup crawls are particularly useful when you want to see if you've fixed the problems that Nikita found on her previous visit.

    Another benefit of followup crawls is that Nikita takes advantage of HTTP caching and won't refetch pages if (a) your server indicates they were cacheable and (b) if Nikita found no errors in the page during the previous crawl. This means less traffic on your server and potentially faster crawls.

    When starting a followup crawl, you can still alter any of Nikita's advanced options (like lowering a politeness delay or providing a URL filter).

    You must be logged in to start a followup crawl.

  9. I'm a Webmaster. How do I keep Nikita out of my site?
    Nikita obeys the robots.txt standard. She will obey any rule that uses a user-agent of Nikita The Spider or NikitaTheSpider. For instance, a rule like this would keep Nikita out of your entire site:
    User-agent: Nikita The Spider
    Disallow: /
  10. Does Nikita understand custom doctypes?
    No. When Nikita encounters an unfamiliar formal public identifier, she falls back to using HTML 4.01 Transitional (or XHTML 1.0 Transitional for documents delivered with an XHTML media type). There is a list of the FPIs that Nikita recognizes.
  11. My site uses ASP.Net. Is there a browser capabilities file for Nikita?
    Yes. Here's an ASP.Net browser caps file for Nikita kindly submitted by Kevin F. Note that this file lies a little bit because Nikita supports neither Java applets nor Javascript. However, setting these options to "true" makes ASP.Net create pages for Nikita that are similar to those created for an ordinary Web browser which is probably the content that you want Nikita to validate.
  12. Who is behind this service?
    Philip Semanchuk is the designer, programmer, server monkey, cat wrangler, chief cook and bottle washer.
This space intentionally left blank