Wednesday, May 6. 2009
Barcamp Northeast

I've spoken previously about BarcampScotland which was great in 2008 but which I missed in 2009 because I was down in Newcastle attending the inaugural UK maker faire. I was disappointed to be missing the barcamp but had so much fun down in Newcastle I'm glad I made the choice I did.
One of the people I spoke to during the maker faire mentioned that there was a barcamp being planned in Newcastle as well. This was great news. Having your cake and eating it style great news. All the fun of the maker faire and an opportunity to get my barcamp fix as well.
When tickets became available I immediately registered and if you're close to Newcastle and/or are willing to travel I would encourage you to register as well. There are still some tickets left.
Thursday, September 25. 2008
Plugins for php frameworks
A week or so ago I came across django-monetize. It’s a plugin for the django python framework which makes it quick and easy to display adverts on a site. As far as I can see there is nothing stunningly new about it but at the same time I can see how it would speed up development. It’s a good simple wheel that should save you from reinventing a wheel for each project you work on. This got me thinking about php frameworks.
I’m most familiar with Zend framework (ZF), so if I’ve missed something interesting in one of the other frameworks please let me know in the comments, but ZF doesn’t seem to have anything like this. Going off at a tangent for a moment if you consider yourself a baker or would like to learn more about baking stick around (or skip) to the end where there is something you may be interested in. Also, I have to say I find the idea of “baking a CakePHP app” very satisfying. Solar developers produce solar systems which sounds equally exciting. What’s up with ZF? Anyway, back to the point . . .
The Magento eCommerce system could be considered to be a “plugin” for ZF and wildflower is a timely example of a CMS system based on CakePHP and I'm sure there are dozens of other blogs, CMSs, forums and wiki's based on other frameworks but these aren't really what I'm thinking about. There are a variety of different tasks, smaller tasks, in any moderately complex project which could be handled by 'plugins'. An emailing system with double opt-in, templating, tracking, unsubscribe and bounce handling might be a possibility. Site registration with signup, sign-in, signout, recover password and email confirmation might also be a possible though this might be reaching the point where the difficulty in customising reaches the difficulty in setting a system up from scratch.
I do think that Magento and similar projects is stretching beyond breaking the idea of a plugin but there may be instances where it makes sense. One of the issues I've had with projects like phpBB and wordpress is that they only play nicely with other projects if they're in charge. Being able to take eCommerce, forum and blog projects and slap them all on top of one framework, all working from one authentication layer with similar templating systems could be a refreshing change.
This post has been little more than a brain dump but I think it presents an interesting possibility. I'm going to continue thinking about which bits of functionality can be sensibly forked off into a separate system. If you have any ideas I would love to read them in the comments.
Baking
Earlier on I said there would be something for bakers and/or those wanting to learn to bake. At the tail end of last month (where has September gone?!) I was approached by the folks over at packt publishing about reviewing “CakePHP Application Development”. Unfortunately I didn’t feel I had the time to do it justice so I had to pass. If this sounds like a book you may be interested in you have a couple of options for finding out more. There is a chapter[pdf] available for free online and Jonathan Snook has a review up..
I’m most familiar with Zend framework (ZF), so if I’ve missed something interesting in one of the other frameworks please let me know in the comments, but ZF doesn’t seem to have anything like this. Going off at a tangent for a moment if you consider yourself a baker or would like to learn more about baking stick around (or skip) to the end where there is something you may be interested in. Also, I have to say I find the idea of “baking a CakePHP app” very satisfying. Solar developers produce solar systems which sounds equally exciting. What’s up with ZF? Anyway, back to the point . . .
The Magento eCommerce system could be considered to be a “plugin” for ZF and wildflower is a timely example of a CMS system based on CakePHP and I'm sure there are dozens of other blogs, CMSs, forums and wiki's based on other frameworks but these aren't really what I'm thinking about. There are a variety of different tasks, smaller tasks, in any moderately complex project which could be handled by 'plugins'. An emailing system with double opt-in, templating, tracking, unsubscribe and bounce handling might be a possibility. Site registration with signup, sign-in, signout, recover password and email confirmation might also be a possible though this might be reaching the point where the difficulty in customising reaches the difficulty in setting a system up from scratch.
I do think that Magento and similar projects is stretching beyond breaking the idea of a plugin but there may be instances where it makes sense. One of the issues I've had with projects like phpBB and wordpress is that they only play nicely with other projects if they're in charge. Being able to take eCommerce, forum and blog projects and slap them all on top of one framework, all working from one authentication layer with similar templating systems could be a refreshing change.
This post has been little more than a brain dump but I think it presents an interesting possibility. I'm going to continue thinking about which bits of functionality can be sensibly forked off into a separate system. If you have any ideas I would love to read them in the comments.
Baking
Earlier on I said there would be something for bakers and/or those wanting to learn to bake. At the tail end of last month (where has September gone?!) I was approached by the folks over at packt publishing about reviewing “CakePHP Application Development”. Unfortunately I didn’t feel I had the time to do it justice so I had to pass. If this sounds like a book you may be interested in you have a couple of options for finding out more. There is a chapter[pdf] available for free online and Jonathan Snook has a review up..
Sunday, August 10. 2008
Book Review: “Practical Web 2.0 Applications with PHP” by Quentin Zervaas
In addition to Pro PHP I also received the book, “Practical Web 2.0 Applications with PHP” from Apress. This is a very different book to Pro PHP. Whereas Pro PHP introduced a variety of fairly advanced topics and then left it up to us to decide when and where we could implement them in our own projects this book focuses on keeping things simple and walks us step by step through bringing a project from concept to deployment. The audience for this book is not going to be the same as that for Pro PHP. If you are already comfortable taking a project from concept to a working application this book will have little for you. If you are comfortable working with PHP, able to put together standalone tools and pages, perhaps develop a wordpress plugin but have not yet created a complete site from scratch then this may be the book that helps you “step up a gear”.
The phrase “Web 2.0” is (ab)used all too often these days but there isn’t (too) much to worry about with the use of the phrase here. Web 2.0 is used as a convenient way to introduce standards compliant HTML, AJAX, microformats and mashups. Through the 545 pages of this book we slowly build up a blogging platform (think wordpress.com) supporting images, tagging and geographical data displayed using google maps. This is brought together well with the possible exception of google maps which feels as though it has been forced into the site concept so the author can discuss web services.
This is unfortunate as, in a book which was very easy to follow, the chapter dealing with implementing google maps was particularly good. For a book like this which deals with issues for which there are almost as many “right” answers as there are PHP developers it is easy to find things I would have done differently. I would have developed a geographically aware image hosting app with optional blogging whereas the author developed a blogging app with optional images and geographical data. I would have created a users table with each attribute in a separate row whereas the author followed what I can best describe as a denormalised EAV approach. None of this actually matters, this book sets out to offer one approach to implementing a feature rich and complete website and in this goal it succeeds admirably.
The book begins in chapter one by planning the application and touches on a few other issues including search engine optimisation, commenting, unit testing and version control. Perhaps strangely unit testing and version control are encouraged but not used in this book. The format of the book, creating basic functionality in the earlier chapters and then building on it later, would certainly work well with both unit testing and version control. The concept is introduced “as-is” and although it works as a summary of what is to come I would have liked to see some discussion of approaches the reader might take in arriving at such a specification.
Chapter two deals with setting up the web server, application directories, downloading the various libraries which are going to be used and then setting up logging functionality. The Zend Framework (ZF) is used as the basis for the application. At the time this book was written the latest version of ZF was 1.0.2. This means that things like Zend_Form, which are now popular components of the ZF, were not available. Instead the author gives us a couple of scripts to handle form processing and interacting with the database which he has previously developed. PEAR packages and Smarty are also used on the server side and prototype and script.aculo.us used on the browser side.
With the exception of chapter five, which introduces prototype and scriptaculous, chapters three to thirteen build up the application. As might be expected code samples dominate in this book. The accompanying explanations are detailed and easy to follow though and by building on previous chapters avoid coming across as repetitive.
Chapter fourteen looks at deployment and maintenance. Much of this chapter deals with building out the application logging functionality, handling site errors and adding an administration area to the site. Although these are all important areas I feel as though the chapter has been misnamed. The chapter only goes on to look at deployment and managing backups in the final few pages of the book. Alternate config settings for development and production servers are dealt with particularly well. I’ve seen discussion of Zend_Config’s ability to handle inheritance of settings and how the application can “know” which server it is on and use the appropriate settings previously. This book is the first place I’ve seen a practical implementation of this though.
Overall this is a very solid and practical guide to creating web (2.0) applications from scratch with some real gems thrown in. It isn’t going to be for everyone but if you are looking to move from working on small projects to complete applications this book will likely speed you on your way.
p.s. If you're interested in the book be sure to check out the sample chapter and if you decide to buy don't ignore the supporting website
The phrase “Web 2.0” is (ab)used all too often these days but there isn’t (too) much to worry about with the use of the phrase here. Web 2.0 is used as a convenient way to introduce standards compliant HTML, AJAX, microformats and mashups. Through the 545 pages of this book we slowly build up a blogging platform (think wordpress.com) supporting images, tagging and geographical data displayed using google maps. This is brought together well with the possible exception of google maps which feels as though it has been forced into the site concept so the author can discuss web services.
This is unfortunate as, in a book which was very easy to follow, the chapter dealing with implementing google maps was particularly good. For a book like this which deals with issues for which there are almost as many “right” answers as there are PHP developers it is easy to find things I would have done differently. I would have developed a geographically aware image hosting app with optional blogging whereas the author developed a blogging app with optional images and geographical data. I would have created a users table with each attribute in a separate row whereas the author followed what I can best describe as a denormalised EAV approach. None of this actually matters, this book sets out to offer one approach to implementing a feature rich and complete website and in this goal it succeeds admirably.
The book begins in chapter one by planning the application and touches on a few other issues including search engine optimisation, commenting, unit testing and version control. Perhaps strangely unit testing and version control are encouraged but not used in this book. The format of the book, creating basic functionality in the earlier chapters and then building on it later, would certainly work well with both unit testing and version control. The concept is introduced “as-is” and although it works as a summary of what is to come I would have liked to see some discussion of approaches the reader might take in arriving at such a specification.
Chapter two deals with setting up the web server, application directories, downloading the various libraries which are going to be used and then setting up logging functionality. The Zend Framework (ZF) is used as the basis for the application. At the time this book was written the latest version of ZF was 1.0.2. This means that things like Zend_Form, which are now popular components of the ZF, were not available. Instead the author gives us a couple of scripts to handle form processing and interacting with the database which he has previously developed. PEAR packages and Smarty are also used on the server side and prototype and script.aculo.us used on the browser side.
With the exception of chapter five, which introduces prototype and scriptaculous, chapters three to thirteen build up the application. As might be expected code samples dominate in this book. The accompanying explanations are detailed and easy to follow though and by building on previous chapters avoid coming across as repetitive.
Chapter fourteen looks at deployment and maintenance. Much of this chapter deals with building out the application logging functionality, handling site errors and adding an administration area to the site. Although these are all important areas I feel as though the chapter has been misnamed. The chapter only goes on to look at deployment and managing backups in the final few pages of the book. Alternate config settings for development and production servers are dealt with particularly well. I’ve seen discussion of Zend_Config’s ability to handle inheritance of settings and how the application can “know” which server it is on and use the appropriate settings previously. This book is the first place I’ve seen a practical implementation of this though.
Overall this is a very solid and practical guide to creating web (2.0) applications from scratch with some real gems thrown in. It isn’t going to be for everyone but if you are looking to move from working on small projects to complete applications this book will likely speed you on your way.
p.s. If you're interested in the book be sure to check out the sample chapter and if you decide to buy don't ignore the supporting website
Wednesday, July 2. 2008
Random thoughts on random strings
I first started to think about random strings when going through the process of registering an application for Windows Delegated Authentication service. As part of the application you are asked to provide a secret key. You want this to be difficult to guess so a random string is going to be best. Humans are astoundingly bad at being random and I just slapped the keyboard a few times until I felt I had the required 16 characters.
Writing some code to produce a fairly random string is incredibly easy. I've easily done it a dozen times or more. Though only because it is easier to re-write it than to find where I put the last one. They generally look something like this:
Running that creates a string something like '4e)+bSuv#kN^"O)f'. Suitably random. Well, pseudo-random.
This isn't the only way to generate a random string. You could take a similar approach but use the chr function instead.
The output is just as good and you don't have to type out every character you want to use. It was only recently, when working on a puzzle posted by Marco Tabini that I considered using chr. This got me thinking about what other options there were.
The uniqid function is one possibility. It does seem to take about twice as long as the first option listed here though. The manual page also contains this gem:
The emphasis is mine.
Other options include the hashing functions. md5 and sha1 would be two options.
There are a couple of problems with uniqid, md5 and sha1 though. Firstly they all return strings of a set length. You would need to use substr to get a shorter string and chain multiple calls together to get a longer string.
The second problem is that the characters these functions use are limited to lowercase letters and numbers. You no longer have the uppercase letters and punctuation. That is going to make any string easier to guess.
Your first idea is sometimes the best
For me I'm perfectly happy generating a random string one character at a time. In the future I'll likely generate random strings in much the same way I always have with one small alteration. I'll replace the long list of characters with chr. There's no point typing more than I have to.
Writing some code to produce a fairly random string is incredibly easy. I've easily done it a dozen times or more. Though only because it is easier to re-write it than to find where I put the last one. They generally look something like this:
<?php
$charString = '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLOMNOPQRSTUVWXYZ,./<>?;#:@~[]{}-_=+)(*&^%$£"!';
$length = strlen($charString);
$output = '';
for ($a = 0; $a < 16; $a++) {
$output .= $charString{mt_rand(0, $length - 1)};
}
echo $output;
$charString = '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLOMNOPQRSTUVWXYZ,./<>?;#:@~[]{}-_=+)(*&^%$£"!';
$length = strlen($charString);
$output = '';
for ($a = 0; $a < 16; $a++) {
$output .= $charString{mt_rand(0, $length - 1)};
}
echo $output;
Running that creates a string something like '4e)+bSuv#kN^"O)f'. Suitably random. Well, pseudo-random.
This isn't the only way to generate a random string. You could take a similar approach but use the chr function instead.
The output is just as good and you don't have to type out every character you want to use. It was only recently, when working on a puzzle posted by Marco Tabini that I considered using chr. This got me thinking about what other options there were.
The uniqid function is one possibility. It does seem to take about twice as long as the first option listed here though. The manual page also contains this gem:
more_entropy
If set to TRUE, uniqid() will add additional entropy (using the combined linear congruential generator) at the end of the return value, which should make the results more unique.
The emphasis is mine.
Other options include the hashing functions. md5 and sha1 would be two options.
There are a couple of problems with uniqid, md5 and sha1 though. Firstly they all return strings of a set length. You would need to use substr to get a shorter string and chain multiple calls together to get a longer string.
The second problem is that the characters these functions use are limited to lowercase letters and numbers. You no longer have the uppercase letters and punctuation. That is going to make any string easier to guess.
Your first idea is sometimes the best
For me I'm perfectly happy generating a random string one character at a time. In the future I'll likely generate random strings in much the same way I always have with one small alteration. I'll replace the long list of characters with chr. There's no point typing more than I have to.
Sunday, June 29. 2008
Windows Live Contacts coming to PEAR
I've spoken previously about Windows Live Contacts but never really did much with it. I didn't have an immediate use for it and I was growing increasingly apathetic about the entire area of contact grabbers / importers. It was a shame really as it was a really exciting project with Microsoft leading the way in the area. It's been only recently that Google and Yahoo have caught up and released their own APIs for accessing their users data.
I've moaned about how great it would be if we could get a users contacts without having to ask for their password. With services like Windows Live Contacts this is finally possible
With the possibility of actually using the code myself creeping up on the horizon I decided to put the time in to write wrappers for PHP. It can be broken down into two components.
Windows Live Delegated Authentication The first thing we need to do is get permission from the user to access their data. There was already a PHP wrapper for this but it did far more than I needed so I've rewritten it and ignored the parts I don't expect to need. This evening I submitted it to the PEAR proposal process.
Windows Live Contacts The second step is fetching the contacts for the user after you have their permission. I could only find a small test script for this so a more complete implementation was definitely needed. Again, I've just submitted the code for this to the PEAR proposal process.
Both of these packages will likely undergo changes as they go through the proposal process but if you can't wait to get started the files are available to be installed now on the proposal pages. The easiest way is using the PEAR installer. If you haven't used PEAR before please take a look at the manual. If you're still unsure of anything post a comment below.
I've moaned about how great it would be if we could get a users contacts without having to ask for their password. With services like Windows Live Contacts this is finally possible
With the possibility of actually using the code myself creeping up on the horizon I decided to put the time in to write wrappers for PHP. It can be broken down into two components.
Windows Live Delegated Authentication The first thing we need to do is get permission from the user to access their data. There was already a PHP wrapper for this but it did far more than I needed so I've rewritten it and ignored the parts I don't expect to need. This evening I submitted it to the PEAR proposal process.
Windows Live Contacts The second step is fetching the contacts for the user after you have their permission. I could only find a small test script for this so a more complete implementation was definitely needed. Again, I've just submitted the code for this to the PEAR proposal process.
Both of these packages will likely undergo changes as they go through the proposal process but if you can't wait to get started the files are available to be installed now on the proposal pages. The easiest way is using the PEAR installer. If you haven't used PEAR before please take a look at the manual. If you're still unsure of anything post a comment below.
Saturday, June 7. 2008
Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur
At the start of May I received (along with a couple of other people it seems) a couple of books from Julie Miller at Apress publishing with the sole condition being that I post a short review. Liking to think[1] that I would do this anyway it seemed like an offer I couldn’t refuse. So here goes . . .
When the title talks about patterns, frameworks, testing and more it’s not kidding. Kevin McArthur has managed to stuff a lot of information into the three hundred and some pages which make up this book. The inevitable trade-off is that no one section is a complete introduction to the subject it’s covering. Despite this the book is filled with what I can only describe as, “Ah-hah!” and “Doh!” moments. Explanations that suddenly clear away confusions or present better ways of doing something which in hindsight seem so obvious but clearly weren’t beforehand. If this seems sickeningly positive so far it’s because judging the book as a whole there really isn’t anything I can find to criticise. One criticism that has been raised is that for a book titled “Pro” it doesn’t cover enough “enterprise”-y[2] subjects. Greater emphasis could have been given to some concepts but many of the ideas I associate with “enterprise”-y projects are here. Lacking any general aspects to criticise I’ll “go to town” on the individual sections . . .
Continue reading "Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur"
When the title talks about patterns, frameworks, testing and more it’s not kidding. Kevin McArthur has managed to stuff a lot of information into the three hundred and some pages which make up this book. The inevitable trade-off is that no one section is a complete introduction to the subject it’s covering. Despite this the book is filled with what I can only describe as, “Ah-hah!” and “Doh!” moments. Explanations that suddenly clear away confusions or present better ways of doing something which in hindsight seem so obvious but clearly weren’t beforehand. If this seems sickeningly positive so far it’s because judging the book as a whole there really isn’t anything I can find to criticise. One criticism that has been raised is that for a book titled “Pro” it doesn’t cover enough “enterprise”-y[2] subjects. Greater emphasis could have been given to some concepts but many of the ideas I associate with “enterprise”-y projects are here. Lacking any general aspects to criticise I’ll “go to town” on the individual sections . . .
Continue reading "Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur"
Monday, May 5. 2008
Pre-populating forms with the timezone
Following my initial discussion on using geo-targeting to predict timezones I've finally found time to play around with some of these ideas.
The idea
Simplify the user experience by predicting the timezone someone is in and auto-populating a registration form accordingly.
The solution
At the time I wasn't really looking for any solutions other than matching an IP address up to an actual location. This is certainly possible and I'll show you how below but first I'll highlight the alternative. If you are reasonably confident that the timezone will be set correctly on the users computer it is possible to access that value via javascript.
GeoIP to Timezone
If you want to avoid relying on javascript then you still have some options.
Maxmind: The PEAR pacakage supporting maxminds geoip databases doesn't support timezone lookup but the script offered by maxmind directly does support timezone lookup. The usage seems a little convoluted but it is there.
IP2Location: The timezone information is integrated directly in the database once you reach 'DB11', their 11th database offering. The usage should be straightforward. There is only one drawback - DB11 costs $649/year. Personally this is more than I would be willing to spend but their client list is impressive so if this is within your budget spread the moolah and use my affiliate link
.
Free solution: There are still a few options here. Although I haven't tried it I believe the maxmind code should work with their GeoLite products. Alternatively once you have a location, either using maxminds free databases or hostip.info you could try getting to a timezone using the 'world time engine' class available from phpclasses.org. The drawback with this approach is that even after the database lookup you still need to query two web services. That is going to introduce a significant lack in response time.
My preference
Although I started off looking at GeoIP services I would much prefer to be able to tackle this problem using javascript. It's not an ideal solution, some people will have their timezone set incorrectly while others will have javascript disabled but on balance I think it is good enough. This is the icing on the cake rather than core functionality after all.
The idea
Simplify the user experience by predicting the timezone someone is in and auto-populating a registration form accordingly.
The solution
At the time I wasn't really looking for any solutions other than matching an IP address up to an actual location. This is certainly possible and I'll show you how below but first I'll highlight the alternative. If you are reasonably confident that the timezone will be set correctly on the users computer it is possible to access that value via javascript.
GeoIP to Timezone
If you want to avoid relying on javascript then you still have some options.
Maxmind: The PEAR pacakage supporting maxminds geoip databases doesn't support timezone lookup but the script offered by maxmind directly does support timezone lookup. The usage seems a little convoluted but it is there.
IP2Location: The timezone information is integrated directly in the database once you reach 'DB11', their 11th database offering. The usage should be straightforward. There is only one drawback - DB11 costs $649/year. Personally this is more than I would be willing to spend but their client list is impressive so if this is within your budget spread the moolah and use my affiliate link
Free solution: There are still a few options here. Although I haven't tried it I believe the maxmind code should work with their GeoLite products. Alternatively once you have a location, either using maxminds free databases or hostip.info you could try getting to a timezone using the 'world time engine' class available from phpclasses.org. The drawback with this approach is that even after the database lookup you still need to query two web services. That is going to introduce a significant lack in response time.
My preference
Although I started off looking at GeoIP services I would much prefer to be able to tackle this problem using javascript. It's not an ideal solution, some people will have their timezone set incorrectly while others will have javascript disabled but on balance I think it is good enough. This is the icing on the cake rather than core functionality after all.
Thursday, May 1. 2008
Why don't geoip services accept feedback?
I've recently been playing around with geoip databases looking at implementing the type of timezone prediction I previously discussed. I'll be writing a blog post to cover that in the next week or so but first wanted to mention something which has been puzzling me.
Why don't geoip services accept feedback?
A good example of accepting feedback is Akismet. There you can report wrongly missed messages as spam and messages incorrectly labelled as spam as ham. I couldn't find an official statement on their accuracy but figures as high as 99.9% are mentioned. Defensio, a direct competitor, gives an accuracy rate of 99.77%. With my own experience using akismet an error rate of 1 in every 1000 seems about right.
In contrast, maxmind who is, to my knowledge, the commercial leader in geoip, states their accuracy as,
I don't have enough information to query their accuracy in the US but for the rest of the world it falls significantly from personal experience.
With feedback from their users I think this accuracy could be substantially increased.
Hostip.info
Hostip.info, which makes their database freely available does accept feedback from users. It's a simple process with only three screens and a CAPTCHA to get past. Oh, you also need cookies enabled. Not many people are going to use this. What is really needed is an automated process and this system is set up to prevent automated processes.
My 2 cents
Geoip services have the potential to more efficiently utilise their user communities than even the anti-spam services can. Internet users regularly submit their location on forums, social networks and whenever they make a purchase. These are all situations where predictions based on geoip lookups can create a more pleasant user experience. If the user doesn't change the predicted location then all is well. If the user does change the location though then that information should be sent back to the geoip service so the database can be updated.
Analysis of hostnames will only get you so far. To really boost the accuracy you need human reviewers. If you can convert ordinary internet users into reviewers without inconveniencing them then that's even better.
Why don't geoip services accept feedback?
A good example of accepting feedback is Akismet. There you can report wrongly missed messages as spam and messages incorrectly labelled as spam as ham. I couldn't find an official statement on their accuracy but figures as high as 99.9% are mentioned. Defensio, a direct competitor, gives an accuracy rate of 99.77%. With my own experience using akismet an error rate of 1 in every 1000 seems about right.
In contrast, maxmind who is, to my knowledge, the commercial leader in geoip, states their accuracy as,
Over 99% accurate on a country level, 90% accurate on a state level, 81% accurate for the US within a 25 mile radius.
I don't have enough information to query their accuracy in the US but for the rest of the world it falls significantly from personal experience.
With feedback from their users I think this accuracy could be substantially increased.
Hostip.info
Hostip.info, which makes their database freely available does accept feedback from users. It's a simple process with only three screens and a CAPTCHA to get past. Oh, you also need cookies enabled. Not many people are going to use this. What is really needed is an automated process and this system is set up to prevent automated processes.
My 2 cents
Geoip services have the potential to more efficiently utilise their user communities than even the anti-spam services can. Internet users regularly submit their location on forums, social networks and whenever they make a purchase. These are all situations where predictions based on geoip lookups can create a more pleasant user experience. If the user doesn't change the predicted location then all is well. If the user does change the location though then that information should be sent back to the geoip service so the database can be updated.
Analysis of hostnames will only get you so far. To really boost the accuracy you need human reviewers. If you can convert ordinary internet users into reviewers without inconveniencing them then that's even better.
Tuesday, April 29. 2008
Considering my development process
This blog has been quiet for a little over a month now as 'real world' events have consumed all my spare time. Overall it has been a month well spent.
A new start
With a new project starting I feel it's a good time to reflect on experiences with past projects and take a look at what are currently considered to be 'best practices'.
Frameworks
The first decision I made was to take a closer look at the plethora of frameworks which have sprung up over the past year or so. I decided to give zend framework a try and so far have found it to offer what I need. My main concern was that it would be too inflexible. I've wanted to deviate from the default three or four times now in what I would consider to be non-trivial ways and found that by hitting my code base with the manual a couple of times it would eventually yield to my will. There is certainly a learning curve to master but on balance I believe the benefits will be worth it.
Source Control
Previously my source control has been embarrassingly bad. I still have backups which contain backups which contain old code archives to remind me. I suspect there is useful code in there somewhere but it has reached the point now where finding it is so demoralising that I prefer to pretend it doesn't exist and start again.
Recently I've been playing with subversion. It's definitely something I want to continue using. Currently I have it running from a slightly flaky old computer I turn on when needed. It still isn't really an ideal solution. What I really want is an always on service I can connect to from anywhere. To that end I've been looking around for subversion hosting.
Following the suggestions in an year old post by Jonathan Snook I've been comparing the offerings available. I've put together an excel spreadsheet which you can download here or view in Google docs here.
Hopefully that will save someone a little work. Personally I'm edging towards assembla which makes 500 Mb of svn space available for free and seems reasonably priced should my needs grow. I would be delighted to hear from anyone who currently uses them or has in the past.
- PEAR bug triage
- I was able to set aside a little time one weekend for the inaugural PEAR bug triage event. I learnt a lot even if I didn't really achieve much. Definitely something I want to set aside some more time for in the (near) future.
- Edinburgh International Science Festival
- I spent a couple of days (attempting) to exhaust those perpetual motion machines most commonly known as children. I believe I failed entirely. The department I'm based in ran a stall focusing on the heart and healthy living. The experience left me exhausted but with renewed hope that the next generation are not entirely the devils the media would have us believe.
- Munich
- I spent a long weekend in Munich. It seemed like a really nice city. Best of all the trip was free which always adds an extra delight to any experience.
- Tai Chi seminar
- For the second time I went to a weekend seminar on Tai Chi. I've been attending weekly classes for ~ 18 months now but find the focus of these weekend seminars to be valuable. Draining but valuable.
- My next project
- Over the past few days I've finally found time to start work on a new project I've been thinking about for a couple of months.
A new start
With a new project starting I feel it's a good time to reflect on experiences with past projects and take a look at what are currently considered to be 'best practices'.
Frameworks
The first decision I made was to take a closer look at the plethora of frameworks which have sprung up over the past year or so. I decided to give zend framework a try and so far have found it to offer what I need. My main concern was that it would be too inflexible. I've wanted to deviate from the default three or four times now in what I would consider to be non-trivial ways and found that by hitting my code base with the manual a couple of times it would eventually yield to my will. There is certainly a learning curve to master but on balance I believe the benefits will be worth it.
Source Control
Previously my source control has been embarrassingly bad. I still have backups which contain backups which contain old code archives to remind me. I suspect there is useful code in there somewhere but it has reached the point now where finding it is so demoralising that I prefer to pretend it doesn't exist and start again.
Recently I've been playing with subversion. It's definitely something I want to continue using. Currently I have it running from a slightly flaky old computer I turn on when needed. It still isn't really an ideal solution. What I really want is an always on service I can connect to from anywhere. To that end I've been looking around for subversion hosting.
Following the suggestions in an year old post by Jonathan Snook I've been comparing the offerings available. I've put together an excel spreadsheet which you can download here or view in Google docs here.
Hopefully that will save someone a little work. Personally I'm edging towards assembla which makes 500 Mb of svn space available for free and seems reasonably priced should my needs grow. I would be delighted to hear from anyone who currently uses them or has in the past.
Wednesday, March 19. 2008
Is PHP good enough for science?
My 'day job' has nothing to do with PHP. It has nothing to do with any form of programming. I graduated in 2006 with a degree in Biochemistry and went on to do a MSc and now PhD in cardiovascular biology. The closest most of my colleagues come to programming is a formula in an Excel spreadsheet.
It was actually Excel which prompted this post. Yesterday I was analysing some data and bemoaning the poor search functionality that Excel makes available. I had already expanded the small set of experimental data I had with some values pulled from a web service using a quickly hacked together PHP script and it got me to wondering how much better things could be if I just stuck with PHP.
Where's the science?
This train of thought led on to whether PHP has been used all that often for scientific projects. There is an accelerating trend in Biology to make data and tools available via web interfaces. In my opinion this is an environment where PHP excels and yet all the literature I've seen discussing the development of these services uses Perl or occasionally Java.
Searching a little harder for PHP projects yields an equally depressing outlook. In PEAR Jesus Castagnetto released the Science_Chemistry and Math_Stats packages back in 2003. For my purposes though the Chemistry package is a little too 'chemical' and the stats package is a little too basic. In sourceforge there is a package named BioPHP which looks promising but again there has been no activity since 2003. A lot has happened since then.
Biology is increasingly data generative. There is going to be a steadily increasing need for tools to analyse all this data. These are likely to be centralised and made available via web interfaces.
Anyone out there?
I suspect I'm going to be increasingly creating automated solutions to remove some of the repetition involved in processing the, relatively, small amounts of data that I generate. A PHP toolkit able to leverage the latest online databases and perform 'advanced' statistics would be immensely valuable.
So my question is this. Is anyone out there using PHP in a scientific environment? Are there resources available which I've missed?
It was actually Excel which prompted this post. Yesterday I was analysing some data and bemoaning the poor search functionality that Excel makes available. I had already expanded the small set of experimental data I had with some values pulled from a web service using a quickly hacked together PHP script and it got me to wondering how much better things could be if I just stuck with PHP.
Where's the science?
This train of thought led on to whether PHP has been used all that often for scientific projects. There is an accelerating trend in Biology to make data and tools available via web interfaces. In my opinion this is an environment where PHP excels and yet all the literature I've seen discussing the development of these services uses Perl or occasionally Java.
Searching a little harder for PHP projects yields an equally depressing outlook. In PEAR Jesus Castagnetto released the Science_Chemistry and Math_Stats packages back in 2003. For my purposes though the Chemistry package is a little too 'chemical' and the stats package is a little too basic. In sourceforge there is a package named BioPHP which looks promising but again there has been no activity since 2003. A lot has happened since then.
Biology is increasingly data generative. There is going to be a steadily increasing need for tools to analyse all this data. These are likely to be centralised and made available via web interfaces.
Anyone out there?
I suspect I'm going to be increasingly creating automated solutions to remove some of the repetition involved in processing the, relatively, small amounts of data that I generate. A PHP toolkit able to leverage the latest online databases and perform 'advanced' statistics would be immensely valuable.
So my question is this. Is anyone out there using PHP in a scientific environment? Are there resources available which I've missed?
Tuesday, March 11. 2008
MSN Contacts web service should be fully available again
The web service for fetching contacts from MSN messenger / Windows Live Messenger has been hit by a series of problems over the past few days. First I hit the bandwidth limit for the account used to host the service. Next, the server used was hit by a DDoS attack. Finally after all that was sorted out the bandwidth limit again caused problems.
Hopefully everything should now be back to normal and the service will be stable.
As always, if you do have any problems, my contact details are available under the 'About' tab at the top of the page.
Hopefully everything should now be back to normal and the service will be stable.
As always, if you do have any problems, my contact details are available under the 'About' tab at the top of the page.
Friday, March 7. 2008
Geotargeting in forms
I have a love hate relationship with geo-targeting. The web wasn't designed with making it easy to get the geographical location of connected computers in mind. A users geographical location is interesting and potentially valuable though and so methods have been developed to make it (almost) possible.
These methods typically involve something akin to a brute force attack. Figure out where enough IP addresses have been assigned and you can get a good idea of where a user is from their IP address. Other methods involve identifying the computers through which they are communicating with you and assuming the user is in the surrounding geographical area. Neither method is perfect but in the majority of cases you can know which country a user is in with reasonable accuracy.
What I hate about geo-targeting is how some sites think they can locate you more accurately than the country you are in. Maxmind, which is probably the commercial leader, thinks it can guess your location to your nearest city with an accuracy of 81% in the US. Outside of the US I suspect this drops considerably. I'm seeing fewer sites than I once did trying to tell me where I'm connecting to the internet from (and getting it wrong) so I'll skip forward to what I love about geo-targeting.
Continue reading "Geotargeting in forms"
These methods typically involve something akin to a brute force attack. Figure out where enough IP addresses have been assigned and you can get a good idea of where a user is from their IP address. Other methods involve identifying the computers through which they are communicating with you and assuming the user is in the surrounding geographical area. Neither method is perfect but in the majority of cases you can know which country a user is in with reasonable accuracy.
What I hate about geo-targeting is how some sites think they can locate you more accurately than the country you are in. Maxmind, which is probably the commercial leader, thinks it can guess your location to your nearest city with an accuracy of 81% in the US. Outside of the US I suspect this drops considerably. I'm seeing fewer sites than I once did trying to tell me where I'm connecting to the internet from (and getting it wrong) so I'll skip forward to what I love about geo-targeting.
Continue reading "Geotargeting in forms"
Wednesday, March 5. 2008
Posted by Jonathan Street
in Misc, Programming, Web Tools, Website Management, Website Promotion at
14:48
BarCampScotland2008 Roundup
Over a full month after the fact I present my summary of the BarcampScotland2008 event.
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
Sunday, February 3. 2008
BarCampScotland2008 : Initial impressions & slides
The second BarCampScotland event finished yesterday. This was the first BarCamp event I've attended and I have to say I was impressed.
Almost without exception I would describe the talks I attended as interesting or very interesting. I plan to post summaries and link to slides where possible in a later post.
The facilities were also quite impressive. Appleton Tower always looks to me to be a rather drab building from the outside but the concourse, where BarCamp was based, was a deceptively impressive space and certainly met our needs. Having said that I don't know whether they turn the heating off on the weekend but it was definitely chilly.
There were five large (easily seating 100+ people) lecture theatres available on two levels which were again well equipped. For the number of laptops I saw out on display the power sockets beneath every other seat would have been invaluable. It was only in the penultimate session I realised they were there but as that was when I started being concerned about power it all worked out well.
The Slides
I spoke in the morning about contact importers and where I felt they were going in the future.
I've embedded the slidecast below. Feel free to link to it or embed it elsewhere.
Almost without exception I would describe the talks I attended as interesting or very interesting. I plan to post summaries and link to slides where possible in a later post.
The facilities were also quite impressive. Appleton Tower always looks to me to be a rather drab building from the outside but the concourse, where BarCamp was based, was a deceptively impressive space and certainly met our needs. Having said that I don't know whether they turn the heating off on the weekend but it was definitely chilly.
There were five large (easily seating 100+ people) lecture theatres available on two levels which were again well equipped. For the number of laptops I saw out on display the power sockets beneath every other seat would have been invaluable. It was only in the penultimate session I realised they were there but as that was when I started being concerned about power it all worked out well.
The Slides
I spoke in the morning about contact importers and where I felt they were going in the future.
I've embedded the slidecast below. Feel free to link to it or embed it elsewhere.
Sunday, January 20. 2008
BarCampScotland2008
I plan on attending the second BarCamp event in Edinburgh at the start of next month. If you haven't heard of Barcamp before take a look at their site.
The first BarCampScotland even took place last year but I didn't know about it at the time so this will be the first I have attended. The intention with BarCamp seems to be to generate as much online content as possible. Videos on youtube, photos on flickr, presentations on slideshare. Wanting to know what I was letting myself in for I decided to set some time today to see what has previously been posted.
There isn't really much content, except photos, from the first BarCampScotland so I started to expand my search to BarCamp in general. There has been some decent content presented by various people. Below are some I found particularly interesting (though in no particular order . . .
DIY User Research
Leisa Reichelt
This isn't really an area I know much about though though have the uncomfortable feeling that I should. It is a nice introduction though be prepared for the slides running out before the 'story' is complete.
Leisa also presented at FOWA. Her slides and audio are also available.
How to Scale
George Palmer
This is perhaps the most complete overview of scaling a web application I've seen. I suspect the actual presentation would have felt something like a whirlwind going overhead but as we can take our time over each slide it's manageable, and very informative.
Introduction to SlideShare
Kapil Mohan
An introduction to SildeShare at BarCamp and posted to SlideShare. Naturally.
In my opinion they were the best 3 of the first 60. There were 275 in total. If you want to see more you can continue the search.
BarCampScotland
There were not many but there were some . . .
p.s. SlideShare supports audio. When did that happen? Lack of audio support was my main objection to the idea. Having said that none of the presentations have audio which is a shame.
The first BarCampScotland even took place last year but I didn't know about it at the time so this will be the first I have attended. The intention with BarCamp seems to be to generate as much online content as possible. Videos on youtube, photos on flickr, presentations on slideshare. Wanting to know what I was letting myself in for I decided to set some time today to see what has previously been posted.
There isn't really much content, except photos, from the first BarCampScotland so I started to expand my search to BarCamp in general. There has been some decent content presented by various people. Below are some I found particularly interesting (though in no particular order . . .
DIY User Research
Leisa Reichelt
This isn't really an area I know much about though though have the uncomfortable feeling that I should. It is a nice introduction though be prepared for the slides running out before the 'story' is complete.
Leisa also presented at FOWA. Her slides and audio are also available.
How to Scale
George Palmer
This is perhaps the most complete overview of scaling a web application I've seen. I suspect the actual presentation would have felt something like a whirlwind going overhead but as we can take our time over each slide it's manageable, and very informative.
Introduction to SlideShare
Kapil Mohan
An introduction to SildeShare at BarCamp and posted to SlideShare. Naturally.
In my opinion they were the best 3 of the first 60. There were 275 in total. If you want to see more you can continue the search.
BarCampScotland
There were not many but there were some . . .
p.s. SlideShare supports audio. When did that happen? Lack of audio support was my main objection to the idea. Having said that none of the presentations have audio which is a shame.

