Intelligent Spidering: Using Human Knowledge
Finally, intelligent spidering with the power to automatically read both HTML and text
Restricted SpideringMost spidering systems work by blindly following links up to a specified depth, leading to wild goose chases that are WAY off topic. Â Not so with TAI’s “focused” spidering. NLP++ enables crawlers to encode the evidence that people use to decide where to look next. This yields efficient, fast, and cost-effective spidering.
All the HTML, All the TextTAI’s NLP++ not only allows for spidering of unstructured text, but also serves as an expert in HTML, for example by looking INSIDE URLs (or web addresses and links). HTML often gives clues as to where to spider next, and NLP++ gives you the power needed to make those decisions.
Most spiders find links, and then do “tricks” to find the interesting ones. No tricks with TAI spiders. NLP++ utilizes as much or as little linguistic knowledge as needed for a particular spider, so as to build it fast and efficient.
Power on the PageOnce on the page, a specialized TAI analyzer extracts anything needed from that type of page. In other words, you have the full power of NLP++ available once you land on the web pages of interest.
Restricted Spidering: Just the Good Stuff
Not Too Much, Not Too Little
You can harness intelligent spidering to focus only on the information of interest, ignoring pages and parts of pages that are not of interest.