Derick Rethans

Inhoud syndiceren
This feed shows the latest 15 items
Bijgewerkt: 51 minuten 36 seconden geleden

Walking the London LOOP - part 9

wo, 23/04/2014 - 10:14
Walking the London LOOP - part 9

Due to abysmal weather, we moved our walk to Monday this weekend. That was not much of a problem as it was Easter Monday.

loop9-d36_5167.jpg

We traveled back to Kingston and crossed the bridge over the Thames. Just before entering Bushy Park, we passed a perculiar few trees... they had loads of shoes in them! All a similar style as well. The first part through Bushy Park were open fields, and a few ponds with birds nesting in the reeds around it.

loop9-d36_5185.jpg

After a while we crossed a road and ended up at the Woodland Gardens. They consist of two plantations, the Pheasentry Plantation and the Waterhouse Plantation. We had a few issues finding the entrance to the first one, as they had slightly changed the gates. Of course, I have now fixed that in OpenStreetMap. Both "Plantations" are quite different. The first one has more open spaces where the second one is much more dense woodland. In both plantations, everything was very much in bloom.

When leaving the plantations we proceeded through Bushy Park and went the wrong way. That wasn't really bad, as we ended up at the Water Gardens, which only recently opened the public.

After Bushy Park we were a bit thirsty and made a little detour to find a pub, The Windmill. The original idea was to have a quick meal there as well, but the menu was limited and the bar staff rather grumpy. We just stuck around for a pint, and made another detour to find lunch—the local Sainsbury's.

After picking up lunch we followed a few residential streets and found the Crane River, past which we would walk for almost all of the rest of the walk.

loop9-d36_5201.jpg

The park near the river had some good walking paths and it was also the place where we enjoyed our bought sandwiches, crisps and drinks on a bench, in the sun, with flocks of parakeets and airlines flying over.

loop9-d36_5206.jpg

After a while we came upon the Shot Tower, where they previously made "shot" for guns. Near the Shot Tower is also the entrance to the Crane Park Nature Reserve which we probably should have visited as well, but didn't.

After another small stretch of road, we entered Hounslow Heath and from there on we continued through Brazil Mill Woods, Donkey Wood and the Causeway all the way to the Great South West Road. I think we were lucky that it was a Sunday, as there was nearly no traffic.

loop9-d36_5214.jpg

The end of the walk was at Hatton Cross station, near the end of Heathrow's runway 09R/27L. You would probably not be surprised that many planes flew over!

The weather was warm with 16-18°C, but also muggy. We were a lot sweatier than on previous walks.

The photos that I took on this section, as well as the photos of the other sections of the LOOP, are available as a Flickr set.

Categorieën: Open Source, PHP Community

Walking the London LOOP - part 7 and 8

do, 17/04/2014 - 10:29
Walking the London LOOP - part 7 and 8
Section 8

After lunch we came through Bourne Hall Park where section 7 ends and section 8 starts. With 7 being so short, there was plenty of time to also finish section 8.

loop8-d36_5140.jpg

The start of section 8 is also the source of the Hogsmill river. And the whole route of section 8 tries to follow the river as closely as possible. In some places that was not really possible so we had some diversions through residential areas, and slightly smellier, through the Hogsmill Valley Sewage Treatment Works. But most of it, was pleasent walking past the river.

loop8-d36_5151.jpg

Near the end we came to Kingston-Upon-Thames were it was again a bit trickier to follow the river.

loop8-d36_5157.jpg

With the weather being so nice, we had to stop for a few pints at the end of the walk, at two of Kingston's river side pubs, the Bishop and the Gazebo next door!

The weather was again very good, with 16-18°C and no clouds to be seen. We took just over four hours for the two sections that together were 20.3km long.

The photos that I took on this section, as well as the photos of the other sections of the LOOP, are available as a Flickr set.

Categorieën: Open Source, PHP Community

Cursors and the Aggregation Framework

wo, 09/04/2014 - 10:29
Cursors and the Aggregation Framework

With MongoDB 2.6 released, the PHP driver for MongoDB has also seen many updates to support the features in the new MongoDB release. In this series of articles, I will illustrate some of those.

In this article, I will introduce command cursors and demonstrate how they can be applied to aggregations. I previously wrote about the Aggregation Framework last year, but since then it has received a lot of updates and improvements. One of those improvements relates to how the Aggregation Framework (A/F) returns results. Before MongoDB 2.6, the A/F could only return one document, with all the results stored under the results key:

<?php
$m = new MongoClient;
$c = $m->demo->cities;

$pipeline = [
     [ '$group' => [
          '_id' => '$country_code',
          'timezones' => [ '$addToSet' => '$timezone' ]
     ] ],
     [ '$sort' => [ '_id' => 1 ] ],
];

$r = $c->aggregate( $pipeline );
var_dump( $r['result'] );
?>

This code would output something like:

array(242) {
  [0] =>
  array(2) {
     '_id' => string(2) "AD"
     'timezones' => array(1) { [0] => string(14) "Europe/Andorra" }
  }
  [1] =>
  array(2) {
     '_id' => string(2) "AE"
     'timezones' => array(1) { [0] => string(10) "Asia/Dubai" }
  }
  [2] =>
  array(2) {
     '_id' => string(2) "AF"
     'timezones' => array(1) { [0] => string(10) "Asia/Kabul" }
  }
  …

MongoCollection::aggregate() is implemented under the hood as a database command. The method in the PHP driver merely wraps this, but you can also call A/F through the MongoDB::command() method:

<?php
$m = new MongoClient;
$d = $m->demo;

$pipeline = [
     [ '$group' => [
          '_id' => '$country_code',
          'timezones' => [ '$addToSet' => '$timezone' ]
     ] ],
     [ '$sort' => [ '_id' => 1 ] ],
];

$r = $d->command( [
     'aggregate' => 'cities',
     'pipeline' => $pipeline,
] );
var_dump( $r['result'] );
?>

Because a database command only returns one document, the result is limited to a maximum of 16MB. This is not a problem for my example, but it can can certainly be a limiting factor for other A/F queries.

MongoDB 2.6 adds support for returning a cursor for an aggregation command. With the raw command interface, you simply add the extra cursor element:

$r = $d->command( [
     'aggregate' => 'cities',
     'pipeline' => $pipeline,
     'cursor' => [ 'batchSize' => 1 ],
] );
var_dump( $r );

Instead of a document with all results inline, you get a cursor definition back:

array(2) {
  'cursor' =>
  array(3) {
     'id' => class MongoInt64#5 (1) {
          public $value => string(12) "392201189815"
     }
     'ns' => string(11) "demo.cities"
     'firstBatch' => array(1) {
       [0] =>
       array(2) {
          '_id' => string(2) "AD"
          'timezones' => array(1) { [0] => string(14) "Europe/Andorra" }
       }
     }
  }
  'ok' => double(1)
}

The cursor definition contains the cursor ID (in id), the namespace (ns), and whether the command succeeded (in ok). The definition also a portion of the results. The number of items in firstBatch is configured by the value given to batchSize in the command.

To create a cursor that you can iterate over in PHP, you need to convert this cursor definition to a MongoCommandCursor object. You can do that with the MongoCommandCursor::createFromDocument() factory method. This factory method takes three arguments: the MongoClient object ($m in my example), the connection hash, and the cursor definition that was returned. The hash is required so that we can fetch new results from the same connection that executed the original command.

To obtain the connection hash, we need to include a by-ref variable as the third argument to MongoCollection::command():

<?php
$m = new MongoClient;
$d = $m->demo;

$pipeline = [
     [ '$group' => [
          '_id' => '$country_code',
          'timezones' => [ '$addToSet' => '$timezone' ]
     ] ],
     [ '$sort' => [ '_id' => 1 ] ],
];

$r = $d->command(
     [
          'aggregate' => 'cities',
          'pipeline' => $pipeline,
          'cursor' => [ 'batchSize' => 1 ],
     ],
     null,
     $hash
);
var_dump( $hash );

The hash looks like localhost:27017;-;.;26415. Together with the result, you can now construct a MongoCommandCursor:

$cursor = MongoCommandCursor::createFromDocument( $m, $hash, $r );

And iterate over it:

foreach ( $cursor as $result )
{
     echo $result['_id'], ': ', join( ', ', $result['timezones'] ), "\n";
}
?>

As this is all a bit cumbersome, we have also added a helper method for this: MongoCollection::aggregateCursor. This internally does the whole MongoCommandCursor creation dance, and simplifies the previous example to:

<?php
$m = new MongoClient;
$c = $m->demo->cities;

$pipeline = [
     [ '$group' => [
          '_id' => '$country_code',
          'timezones' => [ '$addToSet' => '$timezone' ]
     ] ],
     [ '$sort' => [ '_id' => 1 ] ],
];

$r = $c->aggregateCursor( $pipeline );

foreach ( $r as $result )
{
     echo $result['_id'], ': ', join( ', ', $result['timezones'] ), "\n";
}
?>

This helper also automatically sets the initial batch size to 101. You can change the batchSize for subsequent batches by using the MongoCommandCursor::batchSize() method, and for the initial batch by specifying an option to MongoCollection::aggregateCursor:

$options = [ 'cursor' => [ 'batchSize' => 5 ] ];

$r = $d->cities->aggregateCursor( $pipeline, $options );
$r->batchSize( 25 );

In general, you probably should not change the default batch sizes.

The Aggregation Framework has some other new features in MongoDB 2.6 as well. Please refer to the release notes for more information. I might write another post on some of those features later, too.

Categorieën: Open Source, PHP Community

Walking the London LOOP - part 5 and 6

di, 01/04/2014 - 10:11
Walking the London LOOP - part 5 and 6
Section 5

While waking up we already knew this would be a glorious day. Blue skies with no clouds in sight. The moment I got out of the house I knew I was not going to need my coat either. Getting to Hamsey Green, the start of section 5 was a bit more of a chore than normally. It involved two tube trains to Victoria, a train to West Croydon and then another bus ride down the road to Ken's Auto at Hamsey Green.

loop5-d36_4980.jpg

After a short section next to a road, we entered Riddlesdown. With mostly open fields and a bit of woodland we made it down into the next valley, coming past a disused quarry. We only really noticed the quarry once we made it over a bridge across some railroad tracks and up a fairly steep path up the hill on the other side of the valley.

loop5-d36_4992.jpg

After some steps, and some more steep uphill part we came to Kenley Common, a now open space that used to be farmland, as a swap for the Kenley Aerodrome that the RAF seconded during the second World War. We made a few wrong turns on Kenley Common and there were a few slightly useless fences. Passing through some woods and a field with gliders overhead, we "suddenly" found ourselves at the Wattenden Arms, a pub displaying much WWII memorabilia from the Kenley Aerodrome. The friendly staff served a decent pint, and after refreshing ourselves we continued the walk.

After climbing our first style we were overtaken by another LOOP walker as we passed by the Kenley Observatory and a friendly horse. For a bit we had to walk past a road without footpath or pavement.

loop5-d36_5007.jpg

After that we passed by a field with a sole postbox and then made our way to Happy Valley. With the Sun blazing and everything looking greener that it probably was we descended into the valley and back out on the other end. The signing of the LOOP was a bit confusing so I don't think we followed the route correctly, but we picked up the walk again just shy of the next common, Farthing Downs.

This part of the walk was over a hill crest with the skyline of London in the far background. The section ended with a slight downhill into Coulsdon were we stopped for some refreshments—most importantly cake—at the Poppy Cafe. Because the weather was so nice, we decided to continue with the following section as well, section 6.

Section 6 loop6-d36_5021.jpg

Passing through South Coulsdon station we had a long climb up a residential road before we continued on a bridleway. With a long section through some woods and farmland around, a slight detour around a road without pavement, we came upon the Mayfield Lavender Fields. Sadly, we were too early to see it all in bloom, but there was most definitely already a hint of purple to be seen.

loop6-d36_5052.jpg

We then walked through Oak's Park, after which there was another long straight section on the edge of Surrey that took us past HMP Highdown. Luckily most of it was hidden by hedges and trees. The last part of this much shorter section took us to the Banstead Downs and over the Banstead Downs Golf Club to the end of the walk. From there it was a short link to Banstead, where we luckily only had to wait 20 minutes for the train—there is only a service every hour.

Where section 5 was mostly known for its up and downs, section 6 was the "horse" section. Lots of bridleways and horses around.

The weather was very good, with 16-18°C and no clouds to be seen. We took nearly four and a half hours for the two sections that together were 19.4km long.

The photos that I took on this section, as well as the photos of the other sections of the LOOP, are available as a Flickr set.

Categorieën: Open Source, PHP Community

Walking the London LOOP - part 4

di, 25/03/2014 - 11:09
Walking the London LOOP - part 4

Another weekend, and another section of the LOOP. This time Morag and I left home a bit earlier as we knew this was one of the longer sections of the LOOP at 9 miles.

loop4-d36_4734.jpg

We took the train to Hayes (Kent) and followed a slightly different route to the start of the section. At the end of last one we really could do without the two fairly steep hills. After getting to the start, we soon found ourselves on the Greenwich Meridian, even though the GPS indicated crossing the line about 200 meters earlier. Passing St. John's church the LOOP wanted to takes us right through a "lake", previously the Sparrows Den Playing Fields, but currently flooded due to high levels of ground water. Some jokers had also put a bunch of yellow rubber ducks on the "lake".

loop4-d36_4753.jpg

We found our way around the field and continued towards our first wooded section, afraid of more mud. Instead, we were greated by a collapsed tree on the path. Some mud did show up, but not nearly as much as on previous sections. We came out of the woods and had to follow a decent stretch along a road, then past a "high school" and its playing fields until we came past a promising looking pub, The Sandrock. Although it was open, it was so quiet in there that we continued by climbing up the Addington Hills to treat us to a fine panorama over London. Wembley Stadium, the City and Canary Wharf were all very easy to spot.

loop4-d36_4917.jpg loop4-d36_4933.jpg

After a quick break and avoiding having Chinese food we came onto Tramlink station Coombe Lane after which we disappear in more woods. This time around Heathfield House and Bramley Bank. From there we continued onwards towards more woods (can you believe it!) and around a water tower.

By now, we were definitely hungry (and thirsty) so we decided to make a slight detour into Selsdon to have a bite and pint at The Sir Julian Huxley, a Weatherspoons.

loop4-d36_4966.jpg

After lunch we continued our walk by going through more woods: Selsdon Wood and Puplet Wood. For the first time, we went just outside of Greater London into Surrey. After encountering Elm Farm in Farleigh we fled back into London past some fields to make it to Hamsey Green, the end of the walk. If the previous sections could be called "muddy", this section clearly had a preference for "woods". A bus, train and two tubes later we got home, exhausted.

The weather was mostly good, but colder at 10-12°C and some rain threatened to wet us near the start. We took just over four and a half hours for the 21.1km walk (including detours).

The photos that I took on this section, as well as the photos of the other sections of the LOOP, are available as a Flickr set.

Categorieën: Open Source, PHP Community

Walking the London LOOP - part 3

di, 11/03/2014 - 11:23
Walking the London LOOP - part 3

After section 2's really muddy walk we hoped for better paths for LOOP section 3. As is becoming customary, we started our walk at Jack's in Queen's Park. Not really the walk of course, but a hearty breakfast. Interestingly the place seemed to be crawling with police officers this morning. After breakfast, we were also happy to see that the Queen's Park Panda was back again—it seemed to have gotten a wash.

loop3-d36_4644.jpg

After making our way by rail to Petts Wood—without weekend engineering getting in the way—we proceeded towards Jubilee Country Park for section 3. After going in the wrong direction straight from the start, we managed to find our way and were happy to see that there was not too much mud. And where there was mud, some handy wooden bridges were provided.

loop3-d36_4646.jpg

It was a gorgeously warm day, and it felt pretty much well into spring. The trees were getting into blossom and the sky was blue. After going through a bit of town, the muddy paths returned once we hit woods. Not quite the bogs from section 2, but we couldn't ignore their muddyness altogether either. The route took us through Darrick Wood (with a bit more mud), and after going in the wrong direction a tiny bit (OpenStreetMap was wrong) a view opened up over Farnborough's fields.

loop3-d36_4658.jpg

Leaving the fields behind us we continued through Farnborough until we heard the sound of a violin. The route then continued through the cemetery of St. Giles the Abbot. Besides it being a lovely church and a cemetery full of daffodils it also sported a war memorial for the first and second World War.

Not long after leaving the cemetery behind us we made it to High Elms Park were we stopped for lunch at the Green Roof Cafe. Some scones and a refreshing cider gave us enough energy to continue our way. The first thing we got to admire was the Bromley Millennium Rock — apparently one of the oldest rocks of the British Isles, a mere 2 billion years old.

After passing by the ruins of an old manor, and The Clock House, we climbed up a hill. At the bottom of the hill, the path showed a lot of erosion from running water. We soon turned off the narrow dirt track onto the side of a field which was very pleasant to walk on. Definitely a lot better than the path that we just left, as that had turned into a little lake as you can see below.

loop3-d36_4692.png

A bit later we walked around a field where the Metropolitan Police trains their dogs, and we were often reminded not to get onto their field. The route took us up a hill through the Holwood estate with Holwood House on the top of the hill. The house was owned by William Pitt the Younger, one of Great Britain's prime ministers.

loop3-d36_4708.jpg

Near the top, we encountered the stump of an Oak tree, with a new tree growing in the middle of it. The original oak tree, the Wilberforce Oak, is quite significant in the abolition of the slave trade in the British Empire, and there is a nearby plaque quoting from William Wilberforce's diary stating: "At length, I well remember after a conversation with Mr. Pitt in the open air at the root of an old tree at Holwood, just above the steep descent into the vale of Keston, I resolved to give notice on a fit occasion in the House of Commons of my intention to bring forward the abolition of the slave-trade".

loop3-d36_4712.jpg

Down the hill we walked into Keston Common which has a few lovely ponds that you can fish in. Although the paths around it were a bit muddy it is an excellent spot to find some ice cream. Through some woods we came to a road, with a pub on the other side of it. After climbing up the hill, and down again, we were certainly thirsty and stopped for a pint at gastro pub The Fox Inn. It has a lovely interior and the beer garden was teeming with locals enjoying the sunshine.

loop3-d36_4725.jpg

The walk then took us through Hayes Common, and with the Sun nearly setting all we had to do, is go up and down two steep hills to make it to Hayes station. Perhaps we should take a different route when we start with section 4!

The weather was beautiful at 15-18°C and there was nothing but blue skies. We took just over four hours for the 19.7km walk.

The photos that I took on this section, as well as the photos of the other sections of the LOOP, are available as a Flickr set.

Categorieën: Open Source, PHP Community

Walking the London LOOP - part 2

di, 04/03/2014 - 11:23
Walking the London LOOP - part 2

Back in October 2013, Morag and I started walking the London LOOP - with section 1. It took nearly four months before we embarked on the second section. The delay was mostly caused by the short days and the terrible weather we have had during the winter. But with the Sun returning and the days getting longer it was time to do part 2: Bexley to Jubilee County Park. We originally intended to do the walk on February 15th, but all the trains towards Kent were buggered due to lots of falling trees caused by the latest storm — hence our second attempt last Sunday. Why I thought it was a good idea to do this right after the PHP UK Conference is a bit of a mystery to me still.

Of course, the travel to the start of the walk was not as straightforwards as it could be. With lots of engineering work and bus replacement services. We ended up taking the train to Barnehurst and bussing it to Bexley—much better than the route that was suggested by National Rail Inquiries: train from Charing Cross to Plumstead, bus to Dartford, and then the train back to Bexley.

loop2-d36_4581.jpg

From Bexley startion we crossed under the railway and headed South. Just before getting to the river Cray, we walked through some woodland where a fair amount of trees had not survived the winter storms. Joining the river after a mile or so we noticed that it was still very high, and rather fast flowing. In fact, it was so high that many of the paths were either flooded—or very muddy. The muddy path opened up into the Stable and Footscray Meadows and a very lovely bridge, the Five Arches, crossing over the river Cray.

loop2-d36_4585.jpg loop2-d36_4589.jpg

I think there was a bit more water in the meadows than there usually was. Or perhaps the locals tried to create an extra lake. In any case, there was no dry way out of the meadows into the direction we had to be going. The photo to the right just points out how much water we had to wade through. I estimate it was about 3 inches deep.

loop2-d36_4604.jpg

Past the meadows and All Saints Church we stopped in Sidcup for lunch. Although the route goes past a pub, we decided to skip and instead just pick a cafe in Sidcup itself. But not before we encountered this friendly horse.

Sidcup itself seems like a little village with not much going on, but lunch at Urban Food was decent. After filling up we continued the walk by finding Sidcup Place and crossing the A20 into Scadbury Park.

loop2-d36_4611.jpg

Scadbury Park is rather large and a local nature reserve. It is also old and has a ruined moated manor in the middle, which was owned by the Scathebury family. The LOOP as mapped on OpenStreetMap had the route go past the manor, but I found that was incorrect as shown by the sign posts on the route.

loop2-d36_4616.jpg loop2-d36_4631.jpg

From Scadbury park we crossed into Petts Wood, or rather perhaps it should be called the Petts Mud Flats as there was nearly no space without mud — some times up to half a foot deep. An indication of the amount is visible in the image to the right. I don't think I've ever had this much mud on a walk actually.

loop2-d36_4639.jpg

This meant that we were a bit delayed and we would just miss the train home from Petts Wood station. When coming out of the woods we crossed multiple sets of train tracks. The section as shown on the left is just before the end point of second section in the Jubilee Country Park.

The weather was colder than the first section, but that was no surprise as it is February. At around 8°C there was a fair bit of wind, but we kept it dry.

For the full photo series of the LOOP, see my Flickr set.

Categorieën: Open Source, PHP Community

DateTimeImmutable

di, 25/02/2014 - 16:41
DateTimeImmutable

The first time that my improved DateTime support made its way into PHP was officially in PHP 5.1, although the more advanced features such as the DateTime class only made it appearance in PHP 5.2. Since its introduction the DateTime class implementation suffered from one design mistake — arguably not something that even an RFC would have highlighted.

In PHP, if you do the following:

<?php
function formatNextMondayFromNow( DateTime $dt )
{
        return $dt->modify( 'next monday' )->format( 'Y-m-d' );
}

$d = new DateTime();
echo formatNextMondayFromNow( $d ), "\n";
echo $d->format( 'Y-m-d' ), "\n";
?>

It displays:

2014-02-17
2014-02-17

The modify() method does not only return the modified DateTime object, but also changes the DateTime object it was called on. In an API like above, this is of course totally unexpected and the only way to avoid this behaviour is to do the following instead:

echo formatNextMondayFromNow( clone $d ), "\n";

This mutability property that all modifying methods of the DateTime class have is highly annoying, and something that I would now rather remove. But of course we cannot as that would break backwards compatibility.

So in PHP 5.5, after a few stumbles, I finally managed to rectify this. I did not change the original class's behaviour, but instead, I added a new class. The new DateTimeImmutable class which does not display this "mutable" behaviour, and only returns the modified object:

<?php
function formatNextMondayFromNow( DateTimeImmutable $dt )
{
        return $dt->modify( 'next monday' )->format( 'Y-m-d' );
}

$d = new DateTimeImmutable();
echo formatNextMondayFromNow( $d ), "\n";
echo $d->format( 'Y-m-d' ), "\n";
?>

Which displays:

2014-02-17
2014-02-11

Both the old DateTime and the new DateTimeImmutable classes implement the DateTimeInterface interface. This interface defines all the methods that both classes implement. Of course, they can not be methods that change the object. That is why the interface is restricted to formatting and other read-only methods, such as getTimezone.

Perhaps in the future (PHP 6), we can replace the mutable DateTime class with the new DateTimeImmutable variant. Until then, you will have to take care of this yourself!

Categorieën: Open Source, PHP Community

Type juggling with MongoDB

di, 18/02/2014 - 11:41
Type juggling with MongoDB

As PHP developer you likely know that all GET and POST variables are represented as strings through the $_GET and $_POST super globals. PHP's weak typing system allows you to do calculations with numbers that are stored in strings as well as with normal numbers. For example the following works just fine, and outputs 131.88:

<?php
$_GET['life'] = "42";
echo $_GET['life'] * "3.14", "\n";
?>

Similarly we can compare a number stored in a string with a real number just as easily:

<?php
$a = "1701";
$b = 1701;

echo $a == $b ? "the same" : "similar", "\n";
?>

Which echos the same.

When storing data in a relational database, you should always take care of escaping data, or, of course use prepared statements. As an example, you would store the same GET parameter from above with something like:

<?php
$db = new PDO( "mysql:host=localhost;dbname=test" );
$stmt = $db->prepare( "INSERT INTO test(value) VALUES(?)" );
$stmt->bindparam( 1, $_GET['life'] );
$stmt->execute();
?>

Because the arguments to prepared statements are not sent as part of the string, the database will just cast it to the type that is defined in the database through CREATE TABLE.

And to select this record again, we can run a SELECT query, again either with the number in the string, or as a PHP integer:

<?php
$db = new PDO( "mysql:host=localhost;dbname=test" );
$stmt = $db->prepare( "SELECT * FROM TEST WHERE value = ?" );
$stmt->bindparam( 1, "42" );
// or
$stmt->bindparam( 1, 42 );
$stmt->execute();
?>

And in both cases it will find the record, again, because the database knows which type the value is stored at through it's table schema.

In SQL prepared statements are required to prevent SQL injections. If we look at MongoDB, you will see that because we are not creating an SQL string to be executed against the database, we do not have to be worried about (No)SQL injections:

<?php
$m = new MongoClient;
$m->demo->test->insert( [ 'value' => $_GET['life'] ] );
?>

Because MongoDB is schemaless, the type of each field has not been defined and hence you can store the value as any type you would like:

<?php
$m = new MongoClient;
$m->demo->test->insert( [ 'value' => "42" ] );
$m->demo->test->insert( [ 'value' => 42 ] );
$m->demo->test->insert( [ 'value' => 42.0 ] );
?>

MongoDB stores data as typed values, and the above three inserts result in the following three documents in the database (as seen through the MongoDB Shell):

> db.test.find();
{ "_id" : ObjectId("52f5691544670a8077b0dc51"), "value" : "42" }
{ "_id" : ObjectId("52f5691544670a8077b0dc52"), "value" : NumberLong(42) }
{ "_id" : ObjectId("52f5691544670a8077b0dc53"), "value" : 42 }

Because they are stored as three different documents, essentially all three with a different schema each, it becomes important to use the correct type when querying the data as well.

To find the string and integer variant, we have to run:

<?php
$m = new MongoClient;
var_dump( $m->demo->test->findOne( [ 'value' => "42" ] )['value'] );
var_dump( $m->demo->test->findOne( [ 'value' => 42 ] )['value'] );
?>

Which outputs:

string(2) "42"
int(42)

However, if we want to find the document where value is stored as a floating point number we fail to do that with:

<?php
$m = new MongoClient;
var_dump( $m->demo->test->findOne( [ 'value' => 42.0 ] ) );
?>

Which outputs:

array(2) {
  '_id' =>
  class MongoId#6 (1) {
        public $$id =>
        string(24) "52f5691544670a8077b0dc52"
  }
  'value' =>
  int(42)
}

As you can see it finds the variant with the integer value first. When using find() to find all documents that match the query, we see the document with the floating point value turn up as well:

<?php
$m = new MongoClient;
foreach ( $m->demo->test->find( [ 'value' => 42.0 ] ) as $r )
{
        var_dump( $r['value'] );
}
?>

Which outputs:

int(42)
double(42)

It is possible to get only the document back where the value is a floating point number by enforcing the value is of type float:

<?php
$m = new MongoClient;
$r = $m->demo->test->findOne(
        [ '$and' => [
                [ 'value' => 42.0 ],
                [ 'value' => [ '$type' => 1 ] ]
        ] ]
);
var_dump( $r['value'] );
?>

Which then outputs the expected:

double(42)

The values for the $type operator can be found in the MongoDB documentation.

Conclusion

MongoDB stores values in the type that values have been inserted with. Unlike relational databases MongoDB does not coalesce this into the "defined" type, as there is no type defined.

To find documents, make sure that you use the same type as what you inserted the value as, otherwise MongoDB will not find the document. The execption here is that the three numerical types (32 bit integer, 64 bit integer and double) are interchangable—as long as the value is the same.

Categorieën: Open Source, PHP Community

MongoDB and arbitrary key names

di, 11/02/2014 - 11:41
MongoDB and arbitrary key names

I hang out on the MongoDB IRC channel (Freenode/#mongodb) quite a bit, and an often recurring question is inherently abour storing values as keys.

One of the examples is storing timed data points like this:

{
        person: "derickr",
        steps_made: {
                "20140201": 10800,
                "20140202":  5906,
        }
}

The sub key under steps_made is the date on which the steps are made, and the value is the amount of steps for that day. A schema like this prevents the same key from being used multiple times—enforcing uniqueness, and it's also possible to add steps from future walks atomically:

<?php
$m = new MongoClient;
$m->demo->steps->update(
        [ 'person' => "derickr" ],
        [ '$inc' => [ "steps_made.20140202" => 712 ] ],
        [ 'upsert' => true ]
);
?>

Because of the upsert option, this update query would even work if there was no document for this person yet in the collection, and it would also work in case a specific date did not have an entry yet.

Although it seems sensible to store the data like this, there are a few problems when storing data like this.

The first problem is this schema makes it impossible to find the step counts for a range of dates—both as a normal query or as with the aggregation framework. This is because you can not query on key names. You would have to request whole documents, and pick out the correct keys in the application.

You can also not easily find which dates had a step count larger than 10000, because you would need to know the key name (date) for the comparison first.

Another problem is indexes. Indexes can only be made on field names. To be able to use an indexed lookup with the above schema, you would need to create an index on steps_made.20140201 and on steps_made.20140202, etc. That is of course not practical.

And lastly, with the schema above you can not create aggregates on years or months as, again, you can only manipulate values with the aggregation framework and not with keys.

So what is the major issue here? The schema uses a value as a key. This is like having an XML schema that looks like this:

        derickr
        
                <20140201>10800
                <20140202>712
        


This hopefully makes your eyes bleed.

For very similar reasons as you wouldn't do this in XML (validation, XPath queries), you should not do this with MongoDB.

One alternative is to store the data like this:

{
        person: "derickr",
        steps_made: [
                { date: "20140201", steps: 10800 },
                { date: "20140202", steps:  5906 },
        ]
}

With this schema some of the previous issues go away.

You can create an index on the date field:

<?php
$m = new MongoClient;
$m->demo->steps->ensureIndex( [ 'date' => 1 ] );
?>

And you can create an aggregation for the average per month:

<?php
$m = new MongoClient();
$r = $m->demo->steps->aggregate( [
        [ '$match' => [ 'person' => 'derickr' ] ],
        [ '$unwind' => '$steps_made' ],
        [ '$project' => [
                'person' => 1,
                'steps_made'=> '$steps_made.steps',
                'month' => [ '$substr' => [ '$steps_made.date', 0, 6 ] ]
        ] ],
        [ '$group' => [
                '_id' => '$month',
                'avg' => [ '$avg' => '$steps_made' ]
        ] ],
] );
var_dump( $r['result'] );
?>

It is not yet possible to find all the step count for a range of dates, as they are collectively stored in one document and you would always get a whole document back.

You can not easily find which dates had a step count larger than 10000 with a normal query. However you can do that with the aggregation framework, albeit not in a very efficiant way:

<?php
$m = new MongoClient();
$r = $m->demo->steps->aggregate( [
        [ '$match' => [ 'person' => 'derickr' ] ],
        [ '$unwind' => '$steps_made' ],
        [ '$match' => [ 'steps_made.steps' => [ '$gt' => 10000 ] ] ]
] );
foreach( $r['result'] as $record )
{
        echo $record['steps_made']['date'], "\n";
}
?>

An additional problem with storing the step count for all the days in the same document is that the documents keep growing and growing when new days are added. It is unlikely to hit the 16MB document limit soon as it would take about 1050 years worth of "step data", but in general the recommendation is to avoid having such a data structure. Growing documents also mean that it will need to be moved around on disk a lot, which is not good for performance.

In the last two aggregation framework queries you see a common theme: an $unwind. This is to break up each document into a document that represents a single day. If we store the data like that ourselves, these aggregation framework queries, as well as other queries become easier.

In our second alternative we therefore store the data like:

{
        person: "derickr",
        date: "20140201",
        steps: 10800,
}
{
        person: "derickr",
        date: "20140202",
        steps: 5906,
}

Adding steps for a single walk (and creating a new document for a new day) is mostly the same:

<?php
$m = new MongoClient;
$m->demo->steps->update(
        [ 'person' => 'derickr', 'date' => "20140201" ],
        [ '$inc' => [ 'steps' => 712 ] ],
        [ 'upsert' => true ]
);
?>

Finding the step count for a range of dates is now possible, and rather trivial:

<?php
$m = new MongoClient;
$r = $m->demo->steps->find( [
        'date' => [ '$gte' => "20140201", '$lt' <= "20140301" ]
] );
?>

Compared to the first alternative, the application doesn't need to filter anything out of the returned document either.

Because we don't have to unwind on the steps_made field while aggregating per-month, calculating the average is now simpler as the following aggregation framework query shows:

<?php
$m = new MongoClient();
$r = $m->demo->steps->aggregate( [
        [ '$project' => [
                'person' => 1,
                'steps'=> 1,
                'month' => [ '$substr' => [ '$date', 0, 6 ] ]
        ] ],
        [ '$group' => [
                '_id' => '$month',
                'avg' => [ '$avg' => '$steps' ]
        ] ],
] );
var_dump( $r['result'] );
?>

And finding which days saw more than 10000 steps is now done with a trivial query:

<?php
$m = new MongoClient;
$r = $m->demo->steps->find(
        'person' => 'derickr',
        'steps' => [ '$gt' => 10000 ]
);
?>

So unless you have a requirement where you need to show all the step counts of one person, I would recommend the second alternative as it is the most flexible, and will likely provide the best performance. There are however (other) use cases where the first alternative option makes sense, but I will get back to that in a future article.

Categorieën: Open Source, PHP Community

Understanding Valgrind errors (1)

di, 04/02/2014 - 11:22
Understanding Valgrind errors (1)

While debugging segmentation faults (crashes) in PHP and its extensions, I often use Valgrind to assist me finding the root cause. Valgrind is an instrumentation framework for building dynamic analysis tools. It contains several tools, and its Memcheck tool is the one that detects memory-management problems. Memcheck is really valuable for C and C++ developers and something you should learn, especially when you write PHP extensions.

Memcheck's error messages can sometimes be difficult to understand, so with this (infrequent series), I hope to shed some light on it.

Let's have a look at the following Valgrind error output, which I encountered while debugging issue PHP-963 of the MongoDB driver for PHP:

==18500== Invalid read of size 8
==18500==    at 0xC5FB7F1: zim_MongoCursor_info (cursor.c:866)
==18500==    by 0x9AF93F: execute_internal (zend_execute.c:1480)
==18500==    by 0xBF1B8FD: xdebug_execute_internal (xdebug.c:1565)
==18500==    by 0x9B0715: zend_do_fcall_common_helper_SPEC (zend_vm_execute.h:645)
==18500==    by 0x9B0D88: ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER (zend_vm_execute.h:756)
==18500==    by 0x9AFCAA: execute (zend_vm_execute.h:410)
==18500==    by 0xBF1B47C: xdebug_execute (xdebug.c:1453)
==18500==    by 0x976CC6: zend_execute_scripts (zend.c:1315)
==18500==    by 0x8F3340: php_execute_script (main.c:2502)
==18500==    by 0xA172B9: do_cli (php_cli.c:989)
==18500==    by 0xA1825E: main (php_cli.c:1365)
==18500==  Address 0x38 is not stack'd, malloc'd or (recently) free'd

An Invalid read means that the memory location that the process was trying to read is outside of the memory addresses that are available to the process. size 8 means that the process was trying to read 8 bytes. On 64-bit platforms this could be a pointer, but also for example a long int.

The last line of the error report says Address 0x38 is not stack'd, malloc'd or (recently) free'd, which means that the address that the process was trying to read 8 bytes from starts at 0x38. The line also says that the address is unavailable through stack space, heap space (malloc), or that it was recently a valid memory location.

A very low address, such as 0x38 (56 in decimal), combined with size 8 often indicates that you tried to dereference an element of a struct which was pointed to by a NULL pointer.

When I checked line 866 of cursor.c I saw:

add_assoc_string(return_value, "server", cursor->connection->hash, 1);

cursor is a struct of type mongo_cursor which is defined as:

typedef struct {
        zend_object std;

        /* Connection */
        mongo_connection *connection;
        …
} mongo_cursor;

The zend_object std; is a general struct that is part of every overloaded object (overloaded in the PHP Internals sense) and contains four pointers that make up the first 32 bytes. This means that connection starts at offset 32.

The connection member is a struct of type mongo_connection that is defined as:

typedef struct _mongo_connection
{
        time_t last_ping;         //  0
        int    ping_ms;           //  8
        int    last_ismaster;     // 12
        int    last_reqid;        // 16
        void  *socket;            // 24
        int    connection_type;   // 32
        int    max_bson_size;     // 36
        int    max_message_size;  // 40
        int    tag_count;         // 44
        char **tags;              // 48
        char  *hash;              // 56
        mongo_connection_deregister_callback *cleanup_list; // 64
} mongo_connection;

I've added the offsets as they are on a 64-bit platform as comments.

In C the code cursor->connection->hash gets converted to an address as follows:

  1. Take the address in cursor

  2. Find the offset for the connection member (32)

  3. Find the address that is stored at offset 32 (In my example, that'd be NULL)

  4. Find the offset of hash in the struct that connection represents (56)

  5. Read the pointer data (8 bytes data) at the address from step 3 plus the offset of step 4.

This reads 8 bytes of data from address 56 (0x38) which is not valid, and hence Valgrind produces the error message, and then kills the process.

In short, an error where Address 0x38 is not stack'd, malloc'd or (recently) free'd has a low address in the message often means a NULL-pointer dereference.

Categorieën: Open Source, PHP Community

Hunting for Postboxes (part 1)

di, 28/01/2014 - 11:11
Hunting for Postboxes (part 1)
postbox-pebble.png

The new year brings new hobby projects! In this year, I am going to try to photograph as many UK postboxes as I can. For this I am going to build some apps and overview maps, but I will write more about those later. A teaser for a future post is here to the left ;-).

The first thing that I want my application(s) to do is to give a description of where a postbox is located and this post is the tale of how I went about just doing that.

As my data source, I am using an OpenStreetMap extract for the London region. I import that through my "Import OSM data into MongoDB" script that I wrote about in an earlier article. The script is available on GitHub as part of the 3angle repository. The latest version is at https://raw.github.com/derickr/3angle/master/import-data.php and it also requires https://raw.github.com/derickr/3angle/master/classes.php for some GeoJSON helper classes and https://raw.github.com/derickr/3angle/master/config.php where you can set the database name and collection name (in my case, demo and poiConcat).

My goal is to get from a longitude/latitude pair to the closest postbox's reference number (EC2 201), a description like (On Paul Street, on the corner with Epworth Street), and a direction and distance (East, 86m).

The first thing I do is to look-up the closest postbox. I can very easily do that with the following aggrgation query in MongoDB:

<?php
$m = new MongoClient( 'mongodb://localhost' );
$d = $m->selectDb( 'demo' );
$c = $d->selectCollection( 'poiConcat' );

$center = new GeoJSONPoint( (float) $_GET['lon'], (float) $_GET['lat'] );

$res = $c->aggregate( array(
        '$geoNear' => array(
                'near' => $center->getGeoJson(),
                'distanceField' => 'distance',
                'distanceMultiplier' => 1,
                'maxDistance' => 5000,
                'spherical' => true,
                'query' => array( TAGS => 'amenity=post_box' ),
                'limit' => 1,
        )
) );

I am using the aggregation framework instead of a normal query because the aggregation framework also can return the distance to the found object. The above returns $res, and our first result is located in $res['result'][0], which looks like:

array(
        '_id' => 'n905143645',
        'l' => array(
                'type' => 'Point',
                'coordinates' => array( -0.1960, 51.5376 ),
        ),
        'm' => array(
                'v' => 5,
                'cs' => 14576087,
                'uid' => 37137,
                'ts' => 1357659884,
        ),
        'ts' => array(
                0 => 'amenity=post_box',
                1 => 'operator=Royal Mail',
                2 => 'ref=NW6 14',
        ),
        'ty' => 1,
        'distance' => 162.15002059040299,
),

So postbox NW6 14 is 162.15 meters away from -0.1979, 51.5385 at -0.1960, 51.5376, as you can see in this image:

postbox1.png

To find a description of where the postbox is, we first find the closest road:

$r = $res['result'][0];
$query = [
        LOC => [
                '$near' => $r['l']
        ],
        TAGS => new     MongoRegex(
                '/^highway=(trunk|pedestrian|service|primary|secondary|tertiary|residential|unclassified)/'
        )
];
$road = $c->findOne( $query );

highway is an OpenStreetMap tag that is also used for footpaths, service roads and alleys. These generally don't have names, so with the regular expression we restrict the query to only return "normal" roads. After executing this query, $road now contains:

array (
        '_id' => 'w4442243',
        'ty' => 2,
        'l' => array (
                'type' => 'LineString',
                'coordinates' => array (
                        array (
                                -0.2046823,
                                51.5346008,
                        ),
                        …
                        array (
                                -0.1940129,
                                51.5384693,
                        ),
                ),
        ),
        'ts' => array (
                'hgv=destination',
                'highway=secondary',
                'lit=yes',
                'name=Brondesbury Road',
                'note=Signed as maxweight 7.5T for goods vehicles except for access, so have tagged as hgv=destination',
                'ref=B451',
                'sidewalk=both',
                'source:ref=OS OpenData StreetView',
        ),
        'm' => array (
                'v' => 15,
                'cs' => 18802367,
                'uid' => 37137,
                'ts' => 1384017096,
        ),
)

As an image this looks like:

postbox2.png

We are interested only in the name (name=Brondesbury Road) and the geometry (l). Right now, we can already assemble the description NW6 14, on Brondesbury Road. But we also want to know the closest cross road, which we can find by finding all roads that intersect with our geometry (in l) by running the following query:

$q = $c->find( [
        'l' => [
                '$geoIntersects' => [ '$geometry' => $road['l'] ]
        ],
        'ts' => new MongoRegex(
                '/^highway=(trunk|pedestrian|service|primary|secondary|tertiary|residential|unclassified)/'
        ),
        '_id' => [ '$ne' => $road['_id'] ],
] );

This returns nineteen roads. An extract looks like:

array(19) {
        'w4211713' => array(5) {
                '_id' => string(8) "w4211713"
                'ty' => int(2)
                'l' => array(2) {
                        'type' => string(10) "LineString"
                        'coordinates' => array(22) { ... }
                }
                'ts' =>
                array(4) {
                        [0] => string(19) "highway=residential"
                        [1] => string(15) "maxspeed=20 mph"
                        [2] => string(23) "name=Brondesbury Villas"
                        [3] => string(13) "sidewalk=both"
                }
                …
        }
        'w245650577' =>
        array(5) {
                '_id' => string(10) "w245650577"
                'ty' => int(2)
                'l' => array(2) {
                        'type' => string(10) "LineString"
                        'coordinates' => array(5) { ... }
                }
                'ts' =>
                array(6) {
                        [0] => string(15) "highway=primary"
                        [1] => string(7) "lit=yes"
                        [2] => string(15) "maxspeed=30 mph"
                        [3] => string(22) "name=Kilburn High Road"
                        [4] => string(6) "ref=A5"
                        [5] => string(13) "sidewalk=both"
                }
                …
        }
}

As an image this looks like:

postbox3.png

We are only interested in the roads that have a name and have a different name than the road we have run the intersection query for. In some cases, OpenStreetMap splits up a road in more than one segment carrying the same name. We discard both those in a loop and are then left with an array of intersecting road IDs in the $intersectingWays variable:

$intersectingWays = array();
foreach ( $q as $crossRoad )
{
        $crossTags = Functions::split_tags( $crossRoad[TAGS] );
        if ( !in_array( "name={$roadName}", $crossRoad ) && array_key_exists( 'name', $crossTags ) )
        {
                $intersectingWays[] = $crossRoad['_id'];
        }
}

With these IDs, we then search for the closest road(s) to the initially found postbox location:

$res = $c->aggregate( array(
        '$geoNear' => array(
                'near' => $r['l'],
                'distanceField' => 'distance',
                'distanceMultiplier' => 1,
                'maxDistance' => 5000,
                'spherical' => true,
                'query' => [
                        '_id' => [ '$in' => $intersectingWays ],
                        'ts' => [ '$ne' => "name={$roadName}" ]
                ],
                'limit' => 1,
        )
) );

Again, the result in $res is in a similar format as before, so I won't repeat that. We use the aggrgation framework again so that we also get the distance of this intersecting road to the originally found postbox location. Depending on the distance to the intersecting road, we either use on the corner of (less thatn 25m) or near if it's further away than 25m. For our example postbox, that makes NW6 14, on Brondesbury Road, near Algernon Road which is illustrated by this image:

postbox4.png

The full code for this example can be found at https://github.com/derickr/3angle/tree/master/maps-postbox and you see it in action (for London) at: http://maps.derickrethans.nl/?l=postbox,lat=51.5&lon=-0.128&zoom=17

Categorieën: Open Source, PHP Community

Smoothing lines with splines

di, 21/01/2014 - 10:44
Smoothing lines with splines

For my OpenStreetMap year of edits videos I use PovRay to animate edits of OpenStreetMap. Over the course of a year I show all the edits that have been made. In previous years, I used a simple sine function to rotate the Earth alone it's North/South and East/West axises. This year I wanted to highlight specific events so I needed a different method to rotate the Earth.

I started out with finding events and created two paths for them:

path2.png

This one starts in Greenland, moves towards Brazil and then heads over Russia, India, the Philippines and ends in Indonesia.

path1.png

The second one starts in Korea, moves through Africa, heads towards Antarctica and then heads back to France over Brazil.

Each of the two paths have points highlighted which are (about) 15 days apart. The paths I converted into data files, which look like

0000  -24   70   8500
0180  -64   58   9500
0240  -81   29  10000
0480  -43  -16  15000
0720   -8   36  20000
0840   27   44  15000
1020   76   20  15000
1140  115   25  15000
1320  127   -2   9000
1460  157  -24  10000

The first column is the frame number (four frames per day), then the longitude and the latitude and finally the height of the camera over the point. As you can see, I have not included a line for each of the points in the path images.

From the points in the data files, I then needed to generate all the intermediate points for each of the three columns. And my first (naive) attempt was to simply do a linear interpolation. Point 1's longitude can be calculated by: -24 + (1-0) * ((-64 - -24) / (180 - 0)) and point 856's longitude with: 27 + (855 - 840) * ((76 - 115) / (1020 - 840)). Doing that for all of the three axis (for the first 300 frames) results in the following video:

As you can see, the changes in direction (and height) are really abrupt, and doesn't make for a nice smooth rotating body. So I had to come up with something else: a smoothed line.

There are various different ways of smoothing line, but one of the easier ones that I found was the Spline. A spline is a smoothing function that connects all the data points with sufficiently smooth curves. I used the implementation at http://www.script-tutorials.com/smooth-curve-graphs-with-php-and-gd/ with the other script available through https://github.com/derickr/osm-year-in-edits/blob/master/changes/create-camera-from-file.php

If we now look at the same section of the video, we see that the turns and zoom level are much smoother:

Just focussing on the zoom aspect, I am also reproducing a line graph of the interpolated points here:

spline.png

As you can see, between 1172 and 1328 it makes a very big dip—and that wrecked with my video as it zoomed in too much. I've fixed that manually for the video, but I would like to find an algorithm that did produce smooth lines without such a big "dip". Any ideas are most welcome.

Categorieën: Open Source, PHP Community