Recipe search app: Getting the documents

Instead of Scrapy as the crawler, I’ve changed to forage-fetch from the Forage Search Engine. It’s a search engine written in Javascript on top of Node.js, and a bit unfinished. The main reason I’ve chosen it is that it’s really simple to use compared to Scrapy. Also, it separates the fetching of documents and processing them into something indexable (json). That means less harassing of the site I crawl.

So, first task is to crawl all the documents. After installing Forage, the crawling was simple:

$ forage-fetch -d crawl-html/ -n

forage-fetch chugging along nicely

So, now I got 3788 documents. Not sure that’s all, but it’s close enough for now, and it only took a couple of minutes. Next thing is understanding jQuery Selectors to cherry-pick what I want out of the recipes. It seems pretty straight forward, but that’ll be another day.

