I am currently developing a search engine specifically for recipes.
As a start, I am only targetting the belgian market (since I am from Belgium myself)
Since a couple of years television cooks and cooking shows are a booming business in Belgium.
A natural consequence of that is that there are millions of recipe sites.
This all seems very nice and all, but they all work separately. None of these recipe sites link to each other, so if you're looking to make a certain dish, you have to go over a whole list of bookmarks to see all the different recipes they have to offer.
The intention I have with my website is to make all the other websites searchable.
This way, you can go to a single webpage and find all the different recipes for the dish you want to cook in one place! Now wouldn't that be nice :-)
On this blog, I will let you people in on the progress of the creation of the website (or sometimes, the attempt at progress :-))
At the moment, the site hasn't launched yet, but I have been working on it on and off for the last couple of months. When it's finished, you'll be able to visit it at moam.be.
You probably all think that moambe is a really strange name. Well, you're right, it is a strange name!
Since I'm just targetting Belgium for now, I naturally wanted a .be TLD.
I also wanted a domain name where the TLD is part of the website name
(you know, like mojolicio.us or youtu.be)
After a bit of thinking I came up with moam.be derived from the traditional congolese dish moambe.
The technology used to develop the site is currently:
- Perl: My favorite language :-)
- Mojolicious is at the base of moam.be
- Mojo is also used separately in the back-end
- Moose is also used extensively (more specifically, everything is made via MooseX::Declare)
- Strawberry perl is my distribution
- For testing I use
- CouchDB: A NoSQL database
- jQuery is used in the front-end
- ElasticSearch is used as search engine
- All images are created and manipulated with the GIMP
- For version control, I use mercurial hosted on bitbucket
- And finally: my editor of choice is emacs
The basic architecture of the application looks like this
There is a server side program (the crawler) that crawls all the recipe websites (for now that's just zesta.be and njam.tv, but that list should become a lot larger over time) for each recipesite I write a separate crawler that scrapes the site and normalizes the recipe to an internal data structure.
That structure is saved as a document in the couchdb. If the site provides a thumbnail of the recipe, then that's also scraped and send to a content delivery network (I'm trying to think of scaling in an early stage ;-))
The CDN is a separate webservice. For now it just normalizes the size (100 x 100px) and filetype (png) of the thumbnail and stores it to disk in a structured directory structure.
When a document is added to the couchdb, elasticsearch is automatically notified of that (via a river) and immediately indexes that new document.
So when you visit the site and search for something, the webapp talks to elastic to get the desired results, gets the corresponding thumbnails from the CDN, renders that back to your browser et voila!
Because I don't want to infringe any copyrights, I don't show the full recipe, just the first 100 characters. To read the full recipe, you need to click it and that sends you to the original recipe on the site from which it was scraped
At the moment, I'm writing unit tests for all the code that was written already (I know, I know, I should have done that a lot sooner), so actual development is very low at the moment.
Once everything is tested properly, all I need to do is find a hosting service for all my stuff and launch!
I'm sure that it will be a fantastic and interresting journey.
See you guys later,