The seed URL is Nikita's starting point.
It can be any page on your site but it's typically the home page,
e.g. http://www.example.com/. If there's a path in the seed URL (as in the
examples below), Nikita will restrict herself to the subtree beneath that path. That makes
it easy to validate just a portion of your site.
Here's some sample seed URLs –
http://example.com/http://www.example.com/wombats/http://www.example.com/wombats/CaughtInRudePoses.htmlIn the second and third examples, Nikita would restrict herself to the
/wombats/ subtree of example.com.
Nikita doesn't yet support using an IP address as the seed URL. If you're interested in that feature, please vote for it.
Nikita needs your email address in order to notify you when she's done. If you don't provide an email address, be careful not to close your browser window once you start Nikita, otherwise you might forget the location of your reports. I promise not to spam/share/rent/sell your email address, but talk is cheap and you might not trust me. In that case I suggest using a disposable email address from a service like Spam Gourmet, Sneakemail, Mailinator or Yahoo.
Free crawls stop at 125 pages, whereas deluxe crawls are unlimited (unless you specify a page limit). Deluxe crawls are not free.
Following parameterized URLs is off by default. If you turn it on, Nikita will follow URLs that contain parameters (also known as a query string). Here's some examples:
http://www.example.com/photos.php?id=27http://www.example.com/users/profiles.jsp?name=SpinyNormanhttp://www.example.com/ShoppingCart.asp?catalog=26&itemid=5http://www.example.com/forum/post.php?article=5af23be6012cc4Why is this off by default? Dynamic (query-driven) pages often come in groups in which each page varies only a little. Validating one, three or a dozen might validate the entire group and is certainly cheaper and faster than waiting for Nikita to slog through, say, hundreds of catalog pages which don't vary except for a picture. If you're interested in switching this option on, a well-constructed filter regex might be helpful.
Note that Nikita never follows URLs that contain session IDs regardless of how this option is set.
Checking parameterized URLs is off by default because some sites have a lot of URLs that point to what is essentially the same code, as described above. Note that this option only applies to link checks, not to spidering for new links.
The page limit tells Nikita not to visit more than a certain number of pages on your site.
The politeness delay defines how many seconds Nikita will wait between visits to your site. The minimum value is 3.0 seconds, the maximum is 60.0 and the default is 5.0. Higher values are nicer to your Web server but will increase the amount of time it takes Nikita to gather what she needs to compile your reports. This value can be a float.
If your site is large and you'd like to use a lower delay (e.g. 1/10th of a second), let me know and I can lower it manually.
Use a URL filter regular expression
(regex) when you want Nikita to skip some of the pages on your site and those
pages aren't restricted to a simple subtree. If any part of a URL matches the filter,
Nikita ignores that URL. For instance, the regex below
matches http://example.com. It also matches the English language
version of that site at http://en.example.com, the Ukrainian version at
http://uk.example.com as well as the Polish, Danish and French versions. (Their
translators stay very busy.)
http://(?:en\.|uk\.|pl\.|dk\.|fr\.)??example\.com/
If you negate the filter, Nikita will default to not visiting URLs and will only visit those that match the filter regex.
The regular expressions must satisfy Python's regular expression interpreter. This is reasonably close to Perl's regular expression syntax.
Warning about XHTML 1.0 sent
as text/html is off by default.
Normally, Nikita issues a warning in her
reports when a page's media type contradicts its doctype (e.g. HTML 4.01 sent as
application/xhtml+xml). However, under certain conditions it's
permissible to send XHTML
1.0 as text/html although
there are strong arguments against doing so).
Nikita can't check for those conditions.
WWW equivalence means that
Nikita will ignore a leading www. when comparing URLs on your site.
For instance, Nikita considers http://www.example.net/foo.html and
http://example.net/foo.html as equivalent URLs and if she'd visited the first
wouldn't visit the second. It's a rare Web site on which these point to different
pages, but if yours is one of them you should turn this option off.
The user agent extension is appended
to Nikita's default user agent string. This allows you to
put an specific string in the user agent so that you can easily filter Nikita's visits
out of your Web server logs. Note that you can't replace Nikita's default user agent, you can
only add to it. The default user agent string is Nikita the Spider (http://NikitaTheSpider.com/). You
can add up to 100 characters to this string.
The maximum validation message limit allows you to cap the number of validation messages reported for each page. A page with the right kind of mistake (a typo in the doctype, for instance) can generate hundreds of spurious validation messages which makes the reports unwieldy. This defaults to 500. Setting it to zero (blank) tells Nikita to report all validation messages.
The maximum report size limits the size of each HTML report file to a certain number of kilobytes, so "500" means about ½ megabyte. Nikita breaks up the page reports over as many files as necessary, so a smaller number here will mean a more report files. A big number for the maximum report size means fewer, larger report files. This is measured in KiB and defaults to 350.