Recipe search app: Getting the documents

Instead of Scrapy as the crawler, I’ve changed to forage-fetch from the Forage Search Engine. It’s a search engine written in Javascript on top of Node.js, and a bit unfinished. The main reason I’ve chosen it is that it’s really simple to use compared to Scrapy. Also, it separates the fetching of documents and processing them into something indexable (json). That means less harassing of the site I crawl.

So, first task is to crawl all the documents. After installing Forage, the crawling was simple:

$ forage-fetch -d crawl-html/ -n oppskrift.klikk.no

forage-fetch chugging along nicely

So, now I got 3788 documents. Not sure that’s all, but it’s close enough for now, and it only took a couple of minutes. Next thing is understanding jQuery Selectors to cherry-pick what I want out of the recipes. It seems pretty straight forward, but that’ll be another day.

Disagree, have a comment or want to pitch in? Youre thoughts are more than welcome =)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s