Intelligent Spidering: Using Human Knowledge

Finally, intelligent spidering with the power to automatically read both HTML and text

Most spidering systems work by blindly following links up to a specified depth, leading to wild goose chases that are WAY off topic.  Not so with TAI’s “focused” spidering. NLP++ enables crawlers to encode the evidence that people use to decide where to look next. This yields efficient, fast, and cost-effective spidering.

All the HTML, All the Text

TAI’s NLP++ not only allows for spidering of unstructured text, but also serves as an expert in HTML, for example by looking INSIDE URLs (or web addresses and links). HTML often gives clues as to where to spider next, and NLP++ gives you the power needed to make those decisions.

Most spiders find links, and then do “tricks” to find the interesting ones. No tricks with TAI spiders. NLP++ utilizes as much or as little linguistic knowledge as needed for a particular spider, so as to build it fast and efficient.

Power on the Page

Once on the page, a specialized TAI analyzer extracts anything needed from that type of page. In other words, you have the full power of NLP++ available once you land on the web pages of interest.

Restricted Spidering: Just the Good Stuff

Not Too Much, Not Too Little

You can harness intelligent spidering to focus only on the information of interest, ignoring pages and parts of pages that are not of interest.