Wednesday, December 28, 2011

Header image

Hi all!

I know, I know, no posting for some time, but it's been quite busy at home, at work and with the holidays and all...
I mainly just continued unit testing the crawlers, so not much new development on that front.

BUT: I did finish the header image! :-) Long live the GIMP!
I'm sure you've all seen the website's logo right here in the title of the blog


It's actually one of the first things I created after the initial idea came to me.
It also allowed me to take my first steps into the world of (*cough*) graphic design.
I had no prior experience.

So now, I ventured a bit more and finished the header for the site.





Not bad, huh!
Well, at least not bad to my standards :-)
If any graphic designers are reading this blog and want to laugh at me, please do!
But when you're done, please comment with any suggestions for improvement.

bye
ldx

Monday, December 5, 2011

Mocking part of the class under test

I'm sure that you all know that part of good unit testing includes extensive mocking of the dependencies of the piece of code under test. In order to do this succesfuly, you need to think about separation of concers as early as you can.
Therefore, all classes in moam.be are derived from an interface. When a class depends on another class, then it uses that class though the interface instead of directly. With moose, these interfaces are of course roles.

Example:
  1. The role
    1. use MooseX::Declare;
    2.  
    3. role ICrawlerService {
    4.   requires 'Crawl';
    5.   requires 'CrawlSingle';
    6.   requires 'CrawlNew';
    7. }
    8.  
    9. 1;

  2. The implementation
    1. use MooseX::Declare;
    2. use Service::ICrawlerService;
    3.  
    4. class CrawlerService with ICrawlerService {
    5.  
    6.   method Crawl(Int $startPage?) {
    7.     # do something
    8.   }
    9.  
    10.   method CrawlSingle(Str $url!) {
    11.     # do something
    12.   }
    13.  
    14.   method CrawlNew() {
    15.     # do something
    16.   }
    17. }
    18.  
    19. 1;

  3. The depending class
    1. use MooseX::Declare;
    2. use Service::ICrawlerService;
    3.  
    4. class CrawlerApp {
    5.  
    6.   has CrawlerService => (does => 'ICrawlerService',
    7.                          is => 'ro',
    8.                          required => 1);
    9. }
    10.  
    11. 1;

So now, CrawlerApp::CrawlerService will be set to CrawlerService in the actual application (either via dependency injection, or just manually). But when I'm testing, I can just replace that with a mock object that also implements the ICrawlerService role (I use Test::Mock::Class for this).

This is all really cool, but what if I want to mock out part of a class instead of an entire class.
Something like this:

  1. use MooseX::Declare;
  2.  
  3. class MyClass {
  4.  
  5.   method A() {
  6.     # do some stuff
  7.     my $var = $self->B();
  8.     # do some more stuff with $var
  9.   }
  10.  
  11.   method B() {
  12.     # return something on which A() depends
  13.   }
  14. }
  15.  
  16. 1;

Unit testing learns us that each unit of code must be tested separately. So if I want to test the A-method properly, I need to find a way to mock out the B-method.

As a side note: At work I program in C# and we use Rhino.Mocks as our mock library. The example above is something that cannot be done with Rhino.Mocks. This means that you need to make your own hand-crafted mock-implementation of the class under test and that you have to override the behaviour of the B-method in that derived class. This is a lot of work for a simple test and has frustrated me on numerous occasions! So you can imagine how overjoyed I was when I found out that perl does have way to do this:

Behold: Test::MockObject::Extends

This wonderfull module allows me to do this in my test code:
  1. my $cl = MyClass->new();
  2. $cl = Test::MockObject::Extends->new($cl);
  3. $cl->mock('B', sub { # Return something on which A() depends });
  4.  
  5. # test the A() method. A will keep workin as before because our mock still returns the same value on which A depends
  6.  
  7. $cl->called('B'); # check that B was indeed called by A

TADAAAAAAAAA: I can test the A-method without being annoyed by its dependency of the B-method! As a plus, I can even check to make sure that the A-method did in fact call the B-method.

I like it.

ldx

Thursday, December 1, 2011

Fit testing the crawlers

Testing the crawlers has proved quite difficult.
The operation of each crawler can be divided into two separate actions:
  • Fetch a webpage
  • Process the contents of that webpage
For the fetch-part, I use Mojo::UserAgent so the correct way of testing would be to create a mock object for it and mock the get method.

but:
get returns a Mojo::Transaction::HTTP object.
The res attribute of Mojo::Transaction::HTTP is a Mojo::Message::Response which then in turn contains the parsed contents of the webpage.

So, if I were to use that particular method of testing, I would have to create a Mojo::Message::Response and feed it with the data I want to use in the test. Then I would have to create a Mojo::Transaction::HTTP and add the response to it.
It would look like this:

  1. use Test::Mock::Class ':all';
  2. use Mojo::Transaction::HTTP;
  3. use Mojo::Message::Response;
  4. use File::Slurp;

  5. my $mockUserAgent = mock_anon_class('Mojo::UserAgent')->new_object;
  6. my $getLastPageResponse = Mojo::Message::Response->new;
  7. $getLastPageResponse->code(200);
  8. $getLastPageResponse->body(sub{read_file('testpage.html')});
  9. my $getLastPageTx = Mojo::Transaction::HTTP->new;
  10. $getLastPageTx->res($getLastPageResponse);
  11. $mockUserAgent->mock_return('get', $getLastPageTx, args => ['http://url.under.test']);

  12. ... actual test...

I would have to do that to for each and every test that I wanted to do.
I didn't like this at all, so I started looking for a better option. A more generic one that I would be able to use during the rest of my testing (or even other projects).

After some mucking about, I came across the idea of a mock server. Instead of mocking the useragent itself, I would just mock the webserver on the other end!
Granted, the test depends on the good functioning of  Mojo::UserAgent ├índ on the good functioning of the mock server itself before testing can be done.
But then again, it's a fit test, not a unit test! I want to test the entire application chain!

The MockServer
The idea is quite simple. I create a small Mojolicious::Lite application which is started from within the test via Mojo::Server. The application is given a dictionary of urls and corresponding pages. So when the server receives a request, it looks in the dictionary to see if it knows the url that is being requested and if it does, it returns the contents of the corresponding page.

  1. #!/usr/bin/env perl
  2. use Mojolicious::Lite;
  3. use File::Slurp;
  4. my $config = {};
  5. helper SetConfig => sub {
  6.   shift;
  7.   $config = shift;
  8. };
  9. get '/*path'  => sub {
  10.   my $self = shift;
  11.   my $path = $self->stash('path');
  12.   my @params = $self->param;
  13.   if (scalar @params > 0) {
  14.     $path .= '?';
  15.     foreach my $param (@params) {
  16.       $path .= $param . '=' . $self->param($param) . '&';
  17.     }
  18.     # chop off final '&'
  19.     $path = substr($path, 0, length($path) - 1);
  20.   }
  21.   if (defined $config->{$path}) {
  22.     my @data = read_file($config->{$path});
  23.     $self->render_data("@data");
  24.   } else {
  25.     $self->render_text("unrecognized param: $path");
  26.   }
  27. };
  28. app->start;

Since I'm using mojolicious's wildcard placeholder, you would think that just fetching the path parameter would suffice, but unfortunately, the wildcard placeholder does not really match everything, it matches everything except the url query string (here is part of my quest to find that out).

So the query string needs to be parsed separately and need to be appended to $path to rebuild the url that was originally requested (lines 16 - 25)

Lines 27 - 32 show the actual rerouting. If the requested url is found in the dictionary ($config) then we read in the corresponding file and return that. If we can't find the requested url, then we just return an error message.

The MockServerRunner
Mojo::Server's run method is blocking. That means that, if I wanted to start it from within the test, I had to run the server in a separate thread. This means that starting the server needed a little bit of code too, but since that's just boilerplate, I threw that in a separate module: the MockServerRunner:

  1. package MockServerRunner;
  2. use strict;
  3. use warnings;
  4. use threads;
  5. use Mojo::Server::Daemon;
  6. sub new {
  7.   my ($class, $config) = @_;
  8.   my $server = Mojo::Server::Daemon->new();
  9.   $server->load_app('t/Crawlers/MockServer')->SetConfig($config);
  10.   my $self = bless {THREAD => 0,
  11.                     SERVER => $server}, $class;
  12.   return $self;
  13. }
  14. sub start {
  15.   my $self = shift;
  16.   $self->{THREAD} = threads->create(sub {
  17.                                       my $self = shift;
  18.                                       $self->{SERVER}->run;
  19.                                     }, $self);
  20.   sleep(3);                     # fukit
  21.   $self->{THREAD}->detach;      # live long and prosper!
  22. }
  23. 1;

You can ignore line 25 :-)
I still need to figure out how I can detect that the server is actually running. But for now, it always starts in under three seconds.

After the server is running I detach the thread (line 26), which essentially means that I don't care about it any more. It would be cleaner to send some sort of stop signal to the thread and then wait for it to exit via join, but that would require more programming, shared data and the whole bunch. I didn't want that.
I just want the server to exit when testing is done. So I detach and I trust that perl will clean up after me

Rerouting the request
Okay, so my server is running and ready to serve the pages that I want to run my tests on. There is only one thing left now, and that is making sure that the requests are sent to the mock server, and not to the original website! That's where Moose comes in with its fantastic method modifiers! In this case I'll use before to decorate the get method of Mojo::UserAgent so that the request is re-routed to my mock server.

This code to do that is added to the initialisation of the test. It needs to be repeated exactly once for every crawler. The sample below is from the testing of the ZestaCrawler (the one that crawls zesta.be)

  1. {
  2.   package UserAgent;
  3.   use Moose;
  4.   extends 'Mojo::UserAgent';
  5.   before 'get' => sub {
  6.     $_[1] =~ s/zesta\.be/localhost:3000/;
  7.   };
  8. }

The test
Now all that remains is the test itself, which should by now be a lot shorter!
Lets have a look:

  1. my $mockServer = MockServerRunner->new({'zoeken' => 't/Crawlers/zestacrawler_getlastpage_response.html',
  2.                                         'zoeken?page=0' => 't/Crawlers/zestacrawler_getpage_response.html',
  3.                                         'dummyrecept' => 't/Crawlers/zestacrawler_getrecipe_response.html'});
  4. $mockServer->start;

That's it! I want to test with three different pages, so I just pass the url's and the corresponding file to the MockServerRunner and I start it. Testing is now a breeze and my test-files are a lot cleaner!
Setting up a new test (with a different url) is now reduced to creating an .html file with the desired contents and adding the url and the filename to the list that's passed to the MockServerRunner.

This method is of course far from perfect, but I really like it. It took me a couple of hours of work but I'm quite sure that it'll save time in the long run. Feel free to discuss!


ldx