Nikita the Spider

Nikita News and Updates for 2007

Jump to:

November 2007

As of November 4th, Nikita now offers the ability to perform followup crawls. These are very useful for making sure that you've fixed the problems that Nikita has found for you on a previous crawl. Followup crawls are available to registered users only.

On November 15th a few bugs got fixed and there's a small cosmetic change to the statistics report. In addition, a new report is in the works that will tell you how frequently each validation message appears on your site so that you know which ones are your biggest problems. That will be made public soon.

As of November 18th, the validation message frequency report is now available.

October 2007

In late October, user testing started on a new feature where Nikita will follow up a crawl that she's already done with another crawl that only reports about changed pages. It should be available soon.

September 2007

Nikita validated her 2 millionth page on September 18th.

A couple of important bugs fixed on the 27th -- the URL filter regex now matches any part of the URL string (rather than matching from the first character only), and the report tarballs and Zip files now extract in their own directory rather than in the same directory as the tarball/Zip file. Sorry about the previous obnoxious behavior!

August 2007

On August 16th, Nikita's selection of validation mode changed to keep in step with the new version of the W3C validator.

July 2007

On July 20th, Nikita's page reports got some minor cosmetic updates. Enjoy!

May 2007

Another popular idea from the future features page is now a reality. As of May 14th, Nikita has a new Hot Headers report. This report flags some common HTTP header errors like malformed dates and relative URLs in Location headers. Enjoy!

On May 28th, Nikita had her first birthday. Now she's old enough to start crawls and send out notifications automatically. Happy Birthday!

April 2007

Nikita is now in beta test. As part of this move from alpha to beta, you can now create an account which provides you with a few benefits. First of all, Nikita can list the sites you've asked her to crawl and give you a link to the reports for each. Second, you won't have to type in your email address every time you start a crawl. Last but not least, from now on Nikita will limit anonymous crawls to 150 pages.

As of April 25th, Nikita now evaluates all HTTP header fields that can contain dates ('Date', 'Expires', 'Last-Modified' and 'Retry-After') to make sure the dates are valid according to ยง3.3 of RFC 2616.

On April 26th I added some test pages that are not very interesting to look at themselves but which allow the sample reports to demonstrate how Nikita reports errors.

March 2007

Nikita has now validated more than 1.5 million pages!

As of March 5th, the page reports now have "previous" and "next" links at the bottom of each page, just like the ones I added to the Hot Links reports last month. I'm not sure why it didn't occur to me to do both of these at the same time.

As of March 17th, the Hot Pages report is much improved and the statistics report has a new look, more information and better organization. The new Hot Pages report will also work much better for large Web sites.

February 2007

On February 11th, I added a Python module for you programmers out there. The module is called shm and it allows IPC from Python under most *nix operating systems. It was originally written by Vladimir Marangozov; this version is cleaned up a little and has a nicer interface (IMO).

As of February 16th, Nikita now reports each page's Last-Modified header (if it sends one) and warns you if she finds any that are not RFC 2616 compliant. That's a feature a number of you have asked for. In addition, hot links reports that stretch over multiple pages now have previous/next links at the bottom of each page to make stepping through them easier.

As of February 18th, Nikita's default politeness delay is now five seconds (down from seven). In addition, one can now specify a delay of zero seconds in which case Nikita will fetch and process pages as fast as she can.

January 2007

Happy New Year! On January 11th, the statistics report got a facelift and a detailed explanation of how Nikita calculates those numbers. There was also a backend change that makes Nikita more efficient which should make it easier for her to spider multiple sites simultaneously.

As of January 21st, the Hot Links report now has an option at the top of each page to temporarily hide links that Nikita has not checked. This makes it a lot easier to find those nasty 404s.


2006 is, like, so last year! But if you want to read about it, Nikita's 2006 is detailed here.