Wednesday, November 30, 2011

Hello there...

Hi everybody, and welcome to my blog!

I am currently developing  a search engine specifically for recipes.
As a start, I am only targetting the belgian market (since I am from Belgium myself)
Since a couple of years television cooks and cooking shows are a booming business in Belgium.
A natural consequence of that is that there are millions of recipe sites.

This all seems very nice and all, but they all work separately. None of these recipe sites link to each other, so if you're looking to make a certain dish, you have to go over a whole list of bookmarks to see all the different recipes they have to offer.

The intention I have with my website is to make all the other websites searchable.
This way, you can go to a single webpage and find all the different recipes for the dish you want to cook in one place! Now wouldn't that be nice :-)

On this blog, I will let you people in on the progress of the creation of the website (or sometimes, the attempt at progress :-))

At the moment, the site hasn't launched yet, but I have been working on it on and off for the last couple of months. When it's finished, you'll be able to visit it at moam.be.

Name
You probably all think that moambe is a really strange name. Well, you're right, it is a strange name!
Since I'm just targetting Belgium for now, I naturally wanted a .be TLD.
I also wanted a domain name where the TLD is part of the website name
(you know, like mojolicio.us or youtu.be)
After a bit of thinking I came up with moam.be derived from the traditional congolese dish moambe.

Technology
The technology used to develop the site is currently:
There will probably still be additions to this list over time, but for now, that's it :-)

Architecture
The basic architecture of the application looks like this


There is a server side program (the crawler) that crawls all the recipe websites (for now that's just zesta.be and njam.tv, but that list should become a lot larger over time) for each recipesite I write a separate crawler that scrapes the site and normalizes the recipe to an internal data structure.
That structure is saved as a document in the couchdb. If the site provides a thumbnail of the recipe, then that's also scraped and send to a content delivery network (I'm trying to think of scaling in an early stage ;-))
The CDN is a separate webservice. For now it just normalizes the size (100 x 100px) and filetype (png) of the thumbnail and stores it to disk in a structured directory structure.

When a document is added to the couchdb, elasticsearch is automatically notified of that (via a river) and immediately indexes that new document.

So when you visit the site and search for something, the webapp talks to elastic to get the desired results, gets the corresponding thumbnails from the CDN, renders that back to your browser et voila!

Because I don't want to infringe any copyrights, I don't show the full recipe, just the first 100 characters. To read the full recipe, you need to click it and that sends you to the original recipe on the site from which it was scraped
 
Next steps
At the moment, I'm writing unit tests for all the code that was written already (I know, I know, I should have done that a lot sooner), so actual development is very low at the moment.
Once everything is tested properly, all I need to do is find a hosting service for all my stuff and launch!
I'm sure that it will be a fantastic and interresting journey.
Wanna join?

See you guys later,
ldx