Wednesday, July 2. 2008
Random thoughts on random strings
I first started to think about random strings when going through the process of registering an application for Windows Delegated Authentication service. As part of the application you are asked to provide a secret key. You want this to be difficult to guess so a random string is going to be best. Humans are astoundingly bad at being random and I just slapped the keyboard a few times until I felt I had the required 16 characters.
Writing some code to produce a fairly random string is incredibly easy. I've easily done it a dozen times or more. Though only because it is easier to re-write it than to find where I put the last one. They generally look something like this:
Running that creates a string something like '4e)+bSuv#kN^"O)f'. Suitably random. Well, pseudo-random.
This isn't the only way to generate a random string. You could take a similar approach but use the chr function instead.
The output is just as good and you don't have to type out every character you want to use. It was only recently, when working on a puzzle posted by Marco Tabini that I considered using chr. This got me thinking about what other options there were.
The uniqid function is one possibility. It does seem to take about twice as long as the first option listed here though. The manual page also contains this gem:
The emphasis is mine.
Other options include the hashing functions. md5 and sha1 would be two options.
There are a couple of problems with uniqid, md5 and sha1 though. Firstly they all return strings of a set length. You would need to use substr to get a shorter string and chain multiple calls together to get a longer string.
The second problem is that the characters these functions use are limited to lowercase letters and numbers. You no longer have the uppercase letters and punctuation. That is going to make any string easier to guess.
Your first idea is sometimes the best
For me I'm perfectly happy generating a random string one character at a time. In the future I'll likely generate random strings in much the same way I always have with one small alteration. I'll replace the long list of characters with chr. There's no point typing more than I have to.
Writing some code to produce a fairly random string is incredibly easy. I've easily done it a dozen times or more. Though only because it is easier to re-write it than to find where I put the last one. They generally look something like this:
<?php
$charString = '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLOMNOPQRSTUVWXYZ,./<>?;#:@~[]{}-_=+)(*&^%$£"!';
$length = strlen($charString);
$output = '';
for ($a = 0; $a < 16; $a++) {
$output .= $charString{mt_rand(0, $length - 1)};
}
echo $output;
$charString = '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLOMNOPQRSTUVWXYZ,./<>?;#:@~[]{}-_=+)(*&^%$£"!';
$length = strlen($charString);
$output = '';
for ($a = 0; $a < 16; $a++) {
$output .= $charString{mt_rand(0, $length - 1)};
}
echo $output;
Running that creates a string something like '4e)+bSuv#kN^"O)f'. Suitably random. Well, pseudo-random.
This isn't the only way to generate a random string. You could take a similar approach but use the chr function instead.
The output is just as good and you don't have to type out every character you want to use. It was only recently, when working on a puzzle posted by Marco Tabini that I considered using chr. This got me thinking about what other options there were.
The uniqid function is one possibility. It does seem to take about twice as long as the first option listed here though. The manual page also contains this gem:
more_entropy
If set to TRUE, uniqid() will add additional entropy (using the combined linear congruential generator) at the end of the return value, which should make the results more unique.
The emphasis is mine.
Other options include the hashing functions. md5 and sha1 would be two options.
There are a couple of problems with uniqid, md5 and sha1 though. Firstly they all return strings of a set length. You would need to use substr to get a shorter string and chain multiple calls together to get a longer string.
The second problem is that the characters these functions use are limited to lowercase letters and numbers. You no longer have the uppercase letters and punctuation. That is going to make any string easier to guess.
Your first idea is sometimes the best
For me I'm perfectly happy generating a random string one character at a time. In the future I'll likely generate random strings in much the same way I always have with one small alteration. I'll replace the long list of characters with chr. There's no point typing more than I have to.
Sunday, June 29. 2008
Windows Live Contacts coming to PEAR
I've spoken previously about Windows Live Contacts but never really did much with it. I didn't have an immediate use for it and I was growing increasingly apathetic about the entire area of contact grabbers / importers. It was a shame really as it was a really exciting project with Microsoft leading the way in the area. It's been only recently that Google and Yahoo have caught up and released their own APIs for accessing their users data.
I've moaned about how great it would be if we could get a users contacts without having to ask for their password. With services like Windows Live Contacts this is finally possible
With the possibility of actually using the code myself creeping up on the horizon I decided to put the time in to write wrappers for PHP. It can be broken down into two components.
Windows Live Delegated Authentication The first thing we need to do is get permission from the user to access their data. There was already a PHP wrapper for this but it did far more than I needed so I've rewritten it and ignored the parts I don't expect to need. This evening I submitted it to the PEAR proposal process.
Windows Live Contacts The second step is fetching the contacts for the user after you have their permission. I could only find a small test script for this so a more complete implementation was definitely needed. Again, I've just submitted the code for this to the PEAR proposal process.
Both of these packages will likely undergo changes as they go through the proposal process but if you can't wait to get started the files are available to be installed now on the proposal pages. The easiest way is using the PEAR installer. If you haven't used PEAR before please take a look at the manual. If you're still unsure of anything post a comment below.
I've moaned about how great it would be if we could get a users contacts without having to ask for their password. With services like Windows Live Contacts this is finally possible
With the possibility of actually using the code myself creeping up on the horizon I decided to put the time in to write wrappers for PHP. It can be broken down into two components.
Windows Live Delegated Authentication The first thing we need to do is get permission from the user to access their data. There was already a PHP wrapper for this but it did far more than I needed so I've rewritten it and ignored the parts I don't expect to need. This evening I submitted it to the PEAR proposal process.
Windows Live Contacts The second step is fetching the contacts for the user after you have their permission. I could only find a small test script for this so a more complete implementation was definitely needed. Again, I've just submitted the code for this to the PEAR proposal process.
Both of these packages will likely undergo changes as they go through the proposal process but if you can't wait to get started the files are available to be installed now on the proposal pages. The easiest way is using the PEAR installer. If you haven't used PEAR before please take a look at the manual. If you're still unsure of anything post a comment below.
Saturday, June 7. 2008
Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur
At the start of May I received (along with a couple of other people it seems) a couple of books from Julie Miller at Apress publishing with the sole condition being that I post a short review. Liking to think[1] that I would do this anyway it seemed like an offer I couldn’t refuse. So here goes . . .
When the title talks about patterns, frameworks, testing and more it’s not kidding. Kevin McArthur has managed to stuff a lot of information into the three hundred and some pages which make up this book. The inevitable trade-off is that no one section is a complete introduction to the subject it’s covering. Despite this the book is filled with what I can only describe as, “Ah-hah!” and “Doh!” moments. Explanations that suddenly clear away confusions or present better ways of doing something which in hindsight seem so obvious but clearly weren’t beforehand. If this seems sickeningly positive so far it’s because judging the book as a whole there really isn’t anything I can find to criticise. One criticism that has been raised is that for a book titled “Pro” it doesn’t cover enough “enterprise”-y[2] subjects. Greater emphasis could have been given to some concepts but many of the ideas I associate with “enterprise”-y projects are here. Lacking any general aspects to criticise I’ll “go to town” on the individual sections . . .
Continue reading "Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur"
When the title talks about patterns, frameworks, testing and more it’s not kidding. Kevin McArthur has managed to stuff a lot of information into the three hundred and some pages which make up this book. The inevitable trade-off is that no one section is a complete introduction to the subject it’s covering. Despite this the book is filled with what I can only describe as, “Ah-hah!” and “Doh!” moments. Explanations that suddenly clear away confusions or present better ways of doing something which in hindsight seem so obvious but clearly weren’t beforehand. If this seems sickeningly positive so far it’s because judging the book as a whole there really isn’t anything I can find to criticise. One criticism that has been raised is that for a book titled “Pro” it doesn’t cover enough “enterprise”-y[2] subjects. Greater emphasis could have been given to some concepts but many of the ideas I associate with “enterprise”-y projects are here. Lacking any general aspects to criticise I’ll “go to town” on the individual sections . . .
Continue reading "Book Review: “Pro PHP: Patterns, Frameworks, Testing and More” by Kevin McArthur"
Monday, May 5. 2008
Posted by Jonathan Street
in Misc, PHP Programming, Website Management at
16:41
Comments (7)
Trackbacks (0)
Comments (7)
Trackbacks (0)
Pre-populating forms with the timezone
Following my initial discussion on using geo-targeting to predict timezones I've finally found time to play around with some of these ideas.
The idea
Simplify the user experience by predicting the timezone someone is in and auto-populating a registration form accordingly.
The solution
At the time I wasn't really looking for any solutions other than matching an IP address up to an actual location. This is certainly possible and I'll show you how below but first I'll highlight the alternative. If you are reasonably confident that the timezone will be set correctly on the users computer it is possible to access that value via javascript.
GeoIP to Timezone
If you want to avoid relying on javascript then you still have some options.
Maxmind: The PEAR pacakage supporting maxminds geoip databases doesn't support timezone lookup but the script offered by maxmind directly does support timezone lookup. The usage seems a little convoluted but it is there.
IP2Location: The timezone information is integrated directly in the database once you reach 'DB11', their 11th database offering. The usage should be straightforward. There is only one drawback - DB11 costs $649/year. Personally this is more than I would be willing to spend but their client list is impressive so if this is within your budget spread the moolah and use my affiliate link
.
Free solution: There are still a few options here. Although I haven't tried it I believe the maxmind code should work with their GeoLite products. Alternatively once you have a location, either using maxminds free databases or hostip.info you could try getting to a timezone using the 'world time engine' class available from phpclasses.org. The drawback with this approach is that even after the database lookup you still need to query two web services. That is going to introduce a significant lack in response time.
My preference
Although I started off looking at GeoIP services I would much prefer to be able to tackle this problem using javascript. It's not an ideal solution, some people will have their timezone set incorrectly while others will have javascript disabled but on balance I think it is good enough. This is the icing on the cake rather than core functionality after all.
The idea
Simplify the user experience by predicting the timezone someone is in and auto-populating a registration form accordingly.
The solution
At the time I wasn't really looking for any solutions other than matching an IP address up to an actual location. This is certainly possible and I'll show you how below but first I'll highlight the alternative. If you are reasonably confident that the timezone will be set correctly on the users computer it is possible to access that value via javascript.
GeoIP to Timezone
If you want to avoid relying on javascript then you still have some options.
Maxmind: The PEAR pacakage supporting maxminds geoip databases doesn't support timezone lookup but the script offered by maxmind directly does support timezone lookup. The usage seems a little convoluted but it is there.
IP2Location: The timezone information is integrated directly in the database once you reach 'DB11', their 11th database offering. The usage should be straightforward. There is only one drawback - DB11 costs $649/year. Personally this is more than I would be willing to spend but their client list is impressive so if this is within your budget spread the moolah and use my affiliate link
Free solution: There are still a few options here. Although I haven't tried it I believe the maxmind code should work with their GeoLite products. Alternatively once you have a location, either using maxminds free databases or hostip.info you could try getting to a timezone using the 'world time engine' class available from phpclasses.org. The drawback with this approach is that even after the database lookup you still need to query two web services. That is going to introduce a significant lack in response time.
My preference
Although I started off looking at GeoIP services I would much prefer to be able to tackle this problem using javascript. It's not an ideal solution, some people will have their timezone set incorrectly while others will have javascript disabled but on balance I think it is good enough. This is the icing on the cake rather than core functionality after all.
Tuesday, April 29. 2008
Posted by Jonathan Street
in PHP Programming, Programming, Website Management at
19:47
Comments (3)
Trackbacks (0)
Comments (3)
Trackbacks (0)
Considering my development process
This blog has been quiet for a little over a month now as 'real world' events have consumed all my spare time. Overall it has been a month well spent.
A new start
With a new project starting I feel it's a good time to reflect on experiences with past projects and take a look at what are currently considered to be 'best practices'.
Frameworks
The first decision I made was to take a closer look at the plethora of frameworks which have sprung up over the past year or so. I decided to give zend framework a try and so far have found it to offer what I need. My main concern was that it would be too inflexible. I've wanted to deviate from the default three or four times now in what I would consider to be non-trivial ways and found that by hitting my code base with the manual a couple of times it would eventually yield to my will. There is certainly a learning curve to master but on balance I believe the benefits will be worth it.
Source Control
Previously my source control has been embarrassingly bad. I still have backups which contain backups which contain old code archives to remind me. I suspect there is useful code in there somewhere but it has reached the point now where finding it is so demoralising that I prefer to pretend it doesn't exist and start again.
Recently I've been playing with subversion. It's definitely something I want to continue using. Currently I have it running from a slightly flaky old computer I turn on when needed. It still isn't really an ideal solution. What I really want is an always on service I can connect to from anywhere. To that end I've been looking around for subversion hosting.
Following the suggestions in an year old post by Jonathan Snook I've been comparing the offerings available. I've put together an excel spreadsheet which you can download here or view in Google docs here.
Hopefully that will save someone a little work. Personally I'm edging towards assembla which makes 500 Mb of svn space available for free and seems reasonably priced should my needs grow. I would be delighted to hear from anyone who currently uses them or has in the past.
- PEAR bug triage
- I was able to set aside a little time one weekend for the inaugural PEAR bug triage event. I learnt a lot even if I didn't really achieve much. Definitely something I want to set aside some more time for in the (near) future.
- Edinburgh International Science Festival
- I spent a couple of days (attempting) to exhaust those perpetual motion machines most commonly known as children. I believe I failed entirely. The department I'm based in ran a stall focusing on the heart and healthy living. The experience left me exhausted but with renewed hope that the next generation are not entirely the devils the media would have us believe.
- Munich
- I spent a long weekend in Munich. It seemed like a really nice city. Best of all the trip was free which always adds an extra delight to any experience.
- Tai Chi seminar
- For the second time I went to a weekend seminar on Tai Chi. I've been attending weekly classes for ~ 18 months now but find the focus of these weekend seminars to be valuable. Draining but valuable.
- My next project
- Over the past few days I've finally found time to start work on a new project I've been thinking about for a couple of months.
A new start
With a new project starting I feel it's a good time to reflect on experiences with past projects and take a look at what are currently considered to be 'best practices'.
Frameworks
The first decision I made was to take a closer look at the plethora of frameworks which have sprung up over the past year or so. I decided to give zend framework a try and so far have found it to offer what I need. My main concern was that it would be too inflexible. I've wanted to deviate from the default three or four times now in what I would consider to be non-trivial ways and found that by hitting my code base with the manual a couple of times it would eventually yield to my will. There is certainly a learning curve to master but on balance I believe the benefits will be worth it.
Source Control
Previously my source control has been embarrassingly bad. I still have backups which contain backups which contain old code archives to remind me. I suspect there is useful code in there somewhere but it has reached the point now where finding it is so demoralising that I prefer to pretend it doesn't exist and start again.
Recently I've been playing with subversion. It's definitely something I want to continue using. Currently I have it running from a slightly flaky old computer I turn on when needed. It still isn't really an ideal solution. What I really want is an always on service I can connect to from anywhere. To that end I've been looking around for subversion hosting.
Following the suggestions in an year old post by Jonathan Snook I've been comparing the offerings available. I've put together an excel spreadsheet which you can download here or view in Google docs here.
Hopefully that will save someone a little work. Personally I'm edging towards assembla which makes 500 Mb of svn space available for free and seems reasonably priced should my needs grow. I would be delighted to hear from anyone who currently uses them or has in the past.
Wednesday, March 19. 2008
Is PHP good enough for science?
My 'day job' has nothing to do with PHP. It has nothing to do with any form of programming. I graduated in 2006 with a degree in Biochemistry and went on to do a MSc and now PhD in cardiovascular biology. The closest most of my colleagues come to programming is a formula in an Excel spreadsheet.
It was actually Excel which prompted this post. Yesterday I was analysing some data and bemoaning the poor search functionality that Excel makes available. I had already expanded the small set of experimental data I had with some values pulled from a web service using a quickly hacked together PHP script and it got me to wondering how much better things could be if I just stuck with PHP.
Where's the science?
This train of thought led on to whether PHP has been used all that often for scientific projects. There is an accelerating trend in Biology to make data and tools available via web interfaces. In my opinion this is an environment where PHP excels and yet all the literature I've seen discussing the development of these services uses Perl or occasionally Java.
Searching a little harder for PHP projects yields an equally depressing outlook. In PEAR Jesus Castagnetto released the Science_Chemistry and Math_Stats packages back in 2003. For my purposes though the Chemistry package is a little too 'chemical' and the stats package is a little too basic. In sourceforge there is a package named BioPHP which looks promising but again there has been no activity since 2003. A lot has happened since then.
Biology is increasingly data generative. There is going to be a steadily increasing need for tools to analyse all this data. These are likely to be centralised and made available via web interfaces.
Anyone out there?
I suspect I'm going to be increasingly creating automated solutions to remove some of the repetition involved in processing the, relatively, small amounts of data that I generate. A PHP toolkit able to leverage the latest online databases and perform 'advanced' statistics would be immensely valuable.
So my question is this. Is anyone out there using PHP in a scientific environment? Are there resources available which I've missed?
It was actually Excel which prompted this post. Yesterday I was analysing some data and bemoaning the poor search functionality that Excel makes available. I had already expanded the small set of experimental data I had with some values pulled from a web service using a quickly hacked together PHP script and it got me to wondering how much better things could be if I just stuck with PHP.
Where's the science?
This train of thought led on to whether PHP has been used all that often for scientific projects. There is an accelerating trend in Biology to make data and tools available via web interfaces. In my opinion this is an environment where PHP excels and yet all the literature I've seen discussing the development of these services uses Perl or occasionally Java.
Searching a little harder for PHP projects yields an equally depressing outlook. In PEAR Jesus Castagnetto released the Science_Chemistry and Math_Stats packages back in 2003. For my purposes though the Chemistry package is a little too 'chemical' and the stats package is a little too basic. In sourceforge there is a package named BioPHP which looks promising but again there has been no activity since 2003. A lot has happened since then.
Biology is increasingly data generative. There is going to be a steadily increasing need for tools to analyse all this data. These are likely to be centralised and made available via web interfaces.
Anyone out there?
I suspect I'm going to be increasingly creating automated solutions to remove some of the repetition involved in processing the, relatively, small amounts of data that I generate. A PHP toolkit able to leverage the latest online databases and perform 'advanced' statistics would be immensely valuable.
So my question is this. Is anyone out there using PHP in a scientific environment? Are there resources available which I've missed?
Friday, March 7. 2008
Posted by Jonathan Street
in Misc, Programming, Website Management at
21:23
Comments (0)
Trackbacks (2)
Comments (0)
Trackbacks (2)
Geotargeting in forms
I have a love hate relationship with geo-targeting. The web wasn't designed with making it easy to get the geographical location of connected computers in mind. A users geographical location is interesting and potentially valuable though and so methods have been developed to make it (almost) possible.
These methods typically involve something akin to a brute force attack. Figure out where enough IP addresses have been assigned and you can get a good idea of where a user is from their IP address. Other methods involve identifying the computers through which they are communicating with you and assuming the user is in the surrounding geographical area. Neither method is perfect but in the majority of cases you can know which country a user is in with reasonable accuracy.
What I hate about geo-targeting is how some sites think they can locate you more accurately than the country you are in. Maxmind, which is probably the commercial leader, thinks it can guess your location to your nearest city with an accuracy of 81% in the US. Outside of the US I suspect this drops considerably. I'm seeing fewer sites than I once did trying to tell me where I'm connecting to the internet from (and getting it wrong) so I'll skip forward to what I love about geo-targeting.
Continue reading "Geotargeting in forms"
These methods typically involve something akin to a brute force attack. Figure out where enough IP addresses have been assigned and you can get a good idea of where a user is from their IP address. Other methods involve identifying the computers through which they are communicating with you and assuming the user is in the surrounding geographical area. Neither method is perfect but in the majority of cases you can know which country a user is in with reasonable accuracy.
What I hate about geo-targeting is how some sites think they can locate you more accurately than the country you are in. Maxmind, which is probably the commercial leader, thinks it can guess your location to your nearest city with an accuracy of 81% in the US. Outside of the US I suspect this drops considerably. I'm seeing fewer sites than I once did trying to tell me where I'm connecting to the internet from (and getting it wrong) so I'll skip forward to what I love about geo-targeting.
Continue reading "Geotargeting in forms"
Wednesday, March 5. 2008
Posted by Jonathan Street
in Misc, Programming, Web Tools, Website Management, Website Promotion at
19:48
Comments (0)
Trackbacks (0)
Comments (0)
Trackbacks (0)
BarCampScotland2008 Roundup
Over a full month after the fact I present my summary of the BarcampScotland2008 event.
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
Sunday, December 2. 2007
Contacting a contact list: A tutorial - revisited
There have been some requests to revisit the contacting a contact list tutorial and include additional features. Some of the requests have been for features mentioned in my original follow up dealing with some of the potential problems. That's great but until now I've avoided writing such an article. The main reason being that much of the functionality would need to be deeply integrated with an entire site and so everyone would need a custom solution. I've finally decided that a generic example may make it easier for people to implement their own custom solutions and as such here it is.
I feel the improvements I'm going to include are really important so to ensure that everyone can make use of this tutorial I'm going to take a step back and develop it in PHP4. I suspect there may still be some using PHP4. This will likely be the last time I worry about compatibility with PHP4. In the new year it will be PHP 5 all the way.
The Original
For those who haven't seen the original article it
1) Requested login details for gmail or msn messenger (not the same as hotmail)
2) Logged in to the service and fetched the contact details
3) Listed all contacts and enabled the user to choose which should be contacted
4) Sent an email to all requested contacts.
Improvements
This updated tutorial will show how to the above and also the following
1) Defend against malicious attacks
2) Prevent duplicate messages from being sent
3) Allow recipients to opt out of future messages
If you have read the follow up post to the original tutorial you'll see that the improvements are focused around minimising the problems surrounding unsolicited email rather than improving the efficiency of the process. Those potential improvements are really beyond what can sensibly be included in a generic tutorial like this one.
So, without further introduction lets get started. Continue reading "Contacting a contact list: A tutorial - revisited"
I feel the improvements I'm going to include are really important so to ensure that everyone can make use of this tutorial I'm going to take a step back and develop it in PHP4. I suspect there may still be some using PHP4. This will likely be the last time I worry about compatibility with PHP4. In the new year it will be PHP 5 all the way.
The Original
For those who haven't seen the original article it
1) Requested login details for gmail or msn messenger (not the same as hotmail)
2) Logged in to the service and fetched the contact details
3) Listed all contacts and enabled the user to choose which should be contacted
4) Sent an email to all requested contacts.
Improvements
This updated tutorial will show how to the above and also the following
1) Defend against malicious attacks
2) Prevent duplicate messages from being sent
3) Allow recipients to opt out of future messages
If you have read the follow up post to the original tutorial you'll see that the improvements are focused around minimising the problems surrounding unsolicited email rather than improving the efficiency of the process. Those potential improvements are really beyond what can sensibly be included in a generic tutorial like this one.
So, without further introduction lets get started. Continue reading "Contacting a contact list: A tutorial - revisited"
Wednesday, November 28. 2007
Dear Santa, please bring me a pony and a plastic rocket and one of those . . .
I'm privileged in that I'm able to 'guide' the choice of some of the gifts I'm likely to receive over Christmas. Given the equally dire state of both my understanding of OOP and the programming sections of my local libraries I feel this is somewhere a good book could help.
From various places on the web I've found three books which may be helpful and would love to get some feedback from the PHP community.
For background, programming is not a full time activity for me and although I can easily knock up a custom cms I wouldn't describe myself as a seasoned pro. Also, although it has not always been the case, I almost work exclusively in PHP and expect this to be the case for the foreseeable future.
The Books
p.s. A cookie for anyone who can name the film I've paraphrased in the post title.
From various places on the web I've found three books which may be helpful and would love to get some feedback from the PHP community.
For background, programming is not a full time activity for me and although I can easily knock up a custom cms I wouldn't describe myself as a seasoned pro. Also, although it has not always been the case, I almost work exclusively in PHP and expect this to be the case for the foreseeable future.
The Books
PHP|architect's Guide to PHP Design Patterns by Jason E. Sweat.
From the reviews it looks like this one may go somewhat off-topic.
PHP 5 Objects, Patterns, and Practice by Matt Zandstra
The reviews seem positive. Probably the most likely candidate at this point.
The Object Oriented Thought Process (Developer's Library) by Matt Weisfeld
From the reviews it may be a too brief introduction.
Have you bought any of the three books above? How useful was it to you? Any other suggestions?
p.s. A cookie for anyone who can name the film I've paraphrased in the post title.
Sunday, November 4. 2007
When scraping content from the web don't make it obvious
A couple of hours ago I was playing around scraping some content from a website. All was going well until suddenly I couldn't get my script to fetch meaningful content. I could view the content perfectly through my browser, on the same IP address, but through PHP is was a no go.
The first thing I did was stop visiting the site for 15 minutes or so and then increase the time between requests. It briefly worked again but quickly stopped. Next, I opened up php.ini and checked what useragent PHP was using. It turned out to be 'PHP'. I changed that and for the past 3 hours (almost) the script has been working perfectly.
Moral of the story: When scraping content from the web don't make it obvious
It's worth noting that I can't say for sure that it was changing the user agent which fixed the problem, it could have just been coincidence, but it's an easy fix and why make it obvious that you're scraping content?
Options
In this instance I was just working from my development server so I had access to php.ini but I had several options.
I could have added a line to my .htaccess file
or used ini_set.
Curl also allows you to specify the useragent.
If you want to take cloaking the useragent further with curl this comment in the PHP manual may be useful.
The first thing I did was stop visiting the site for 15 minutes or so and then increase the time between requests. It briefly worked again but quickly stopped. Next, I opened up php.ini and checked what useragent PHP was using. It turned out to be 'PHP'. I changed that and for the past 3 hours (almost) the script has been working perfectly.
Moral of the story: When scraping content from the web don't make it obvious
It's worth noting that I can't say for sure that it was changing the user agent which fixed the problem, it could have just been coincidence, but it's an easy fix and why make it obvious that you're scraping content?
Options
In this instance I was just working from my development server so I had access to php.ini but I had several options.
I could have added a line to my .htaccess file
php_value user_agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
or used ini_set.
<?php
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
?>
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
?>
Curl also allows you to specify the useragent.
<?php
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
?>
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
?>
If you want to take cloaking the useragent further with curl this comment in the PHP manual may be useful.
Wednesday, October 17. 2007
Posted by Jonathan Street
in Misc, Programming, Website Management at
18:50
Comments (5)
Trackbacks (2)
Comments (5)
Trackbacks (2)
Matt Biddulph discusses the portable social graph at FOWA
As mentioned earlier this is the second, and final, post going into greater detail on one of the sessions at the recent (it's been sitting in my drafts folder for a week) FOWA conference in London.
Matt Biddulph discussed interacting with 3rd party sites and services and the portable social graph. This session was particularly interesting to me given my interest in handling contacts from MSN messenger, Gmail, AOL messenger (AIM) and Yahoo!.
Matt's talk focused around how we can move beyond the use of such scripts with their multitude of risks to a situation were a user can join a new site, release the minimum data necessary and quickly identify their friends already using the service.
For those wondering what risks I am talking about the current situation means that for a site to view your contacts in a service they need total access. This means they could view your mail and send mail through your account. Where a username/password combination is used to access other services on a site these would also be compromised, so to take Google as an example you also grant access to Google Checkout, Adsense, Analytics etc. This is the situation today. Even Dopplr, the contact importing functionality of which the conference chairs spoke about with nothing but praise, requires your username and password to import your Gmail contacts.
Luckily, this situation is on the cusp of improving. For services where you are happy to make your list of friends publicly viewable marking them up with microformats is a simple way to allow other services to make sense of your friends list. Matt mentioned that twitter was also thinking of taking this a step further and supporting openid so that a user could prove that a friends list really was their friends list. I'm not terribly familiar with openid but I assume this wouldn't lead to once again proliferating login details and you simply delegate to your actual provider. Correct me if I'm wrong.
This works well when you're happy for your contacts to be public but you're probably going to want to keep at least some of the contacts in your email address book private so another solution needs to be found. Thankfully a standard way of achieving this is being developed. In fact, Matt talked about five.
Yahoo! BBAuth
Google AuthSub
flickr authentication
AOL OpenAuth
OAuth
I've also talked previously about Windows Live Contacts Control which although more limited in scope for the purposes of this discussion it does much the same thing.
All these services, and hopefully there will be convergence in the standards, open up parts of that glorious social graph and they do it in a safe manner. They also work without an absolute demand for javascript which was one of my main criticisms of Windows Live Contacts when I first spoke about it. Even the Microsoft service now has a RESTful API, either I missed that when I first took a look or it's new.
Hopefully it won't be too long before most of the code on this site is nothing more than an historical curiosity. I don't think we are there yet but soon. . .
The slides from Matt's talk are also now available.
Matt Biddulph discussed interacting with 3rd party sites and services and the portable social graph. This session was particularly interesting to me given my interest in handling contacts from MSN messenger, Gmail, AOL messenger (AIM) and Yahoo!.
Matt's talk focused around how we can move beyond the use of such scripts with their multitude of risks to a situation were a user can join a new site, release the minimum data necessary and quickly identify their friends already using the service.
For those wondering what risks I am talking about the current situation means that for a site to view your contacts in a service they need total access. This means they could view your mail and send mail through your account. Where a username/password combination is used to access other services on a site these would also be compromised, so to take Google as an example you also grant access to Google Checkout, Adsense, Analytics etc. This is the situation today. Even Dopplr, the contact importing functionality of which the conference chairs spoke about with nothing but praise, requires your username and password to import your Gmail contacts.
Luckily, this situation is on the cusp of improving. For services where you are happy to make your list of friends publicly viewable marking them up with microformats is a simple way to allow other services to make sense of your friends list. Matt mentioned that twitter was also thinking of taking this a step further and supporting openid so that a user could prove that a friends list really was their friends list. I'm not terribly familiar with openid but I assume this wouldn't lead to once again proliferating login details and you simply delegate to your actual provider. Correct me if I'm wrong.
This works well when you're happy for your contacts to be public but you're probably going to want to keep at least some of the contacts in your email address book private so another solution needs to be found. Thankfully a standard way of achieving this is being developed. In fact, Matt talked about five.
Yahoo! BBAuth
Google AuthSub
flickr authentication
AOL OpenAuth
OAuth
I've also talked previously about Windows Live Contacts Control which although more limited in scope for the purposes of this discussion it does much the same thing.
All these services, and hopefully there will be convergence in the standards, open up parts of that glorious social graph and they do it in a safe manner. They also work without an absolute demand for javascript which was one of my main criticisms of Windows Live Contacts when I first spoke about it. Even the Microsoft service now has a RESTful API, either I missed that when I first took a look or it's new.
Hopefully it won't be too long before most of the code on this site is nothing more than an historical curiosity. I don't think we are there yet but soon. . .
The slides from Matt's talk are also now available.
Sunday, October 7. 2007
Posted by Jonathan Street
in AJAX, PHP Programming, Programming, Website Management at
15:01
Comments (2)
Trackbacks (0)
Comments (2)
Trackbacks (0)
Steve Souders discusses high performance websites at FOWA
I've previously mentioned the work Steve Souders is doing evangelizing high performance websites at Yahoo! and I was very pleased to be able to hear him speak at the FOWA conference. Sadly the video from that conference isn't going to be freely available and although the audio and slides will be freely available they are not yet available.
Luckily he has previously recorded a video of a very similar (though not identical) presentation and I've embedded it below.
It's 37 minutes so for those in a hurry here are my notes from his presentation at FOWA.
The key points have also been discussed in a series of blog posts so where possible I'll link out to the relevant posts.
Importance of the backend
The first point raised is that the user perception of load time is more important than the actual load time. This means that the relevant metric is not how fast can the html document be returned to the browser but how quickly that html is rendered in the browser. In measuring how quickly a page renders it was quickly realised that the backend performance, returning the html for the page, accounted for only about 5% of the overall time it took to render the page. Even with a full cache the backend was still only 13% of overall time.
Of the top 10 sites in the US only the backend for Google accounted for more than 20% of the load time with a full cache. The Google homepage is so spartan that with a primed cache only two HTTP requests need to be made.
Cache Usage
The importance of the cache was then discussed with data presented discussing how many people at a site had a primed cache. They inserted a one pixel image on the Yahoo! homepage and then monitored the number of HTTP requests with a 200 header (empty cache) and a 304 header (primed header).
It was found that 50% of daily users have an empty cache which accounts for 20% of daily pageviews. This varies depending on the type of site, for example an empty cache accounts for fewer pageviews in a webmail site where each user will view multiple pages, but is broadly accurate. The data highlights the importance of catering for those users without a primed cache. Excessive use of images can't be justified by the assumption that once they are loaded the cached versions can be used. 50% of your users every day will be arriving at your site with an empty cache.
iFrame
Next he talked briefly about iFrames and how they can cause a 40-50 ms delay. onLoad doesn't work until the iFrame source responds which can cause a problem with 3rd party content.
YSlow
Next he discussed YSlow which grades a website based on the 14 rules developed through their research. YSlow is an extension for Firebug the popular development extension for the Firefox browser. It looks at how the page was built. Despite looking at the content rather than the response time its score correlates well with the rendering time. As such it could be a valuable tool during development to predict the speed of a site prior to its launch.
Another issue which YSlow apparently solves is a bug in how Firebug charts HTTP requests. Apparently Firebug will show queries to the cache as HTTP requests and YSlow patches this.
That's all I made notes on. I've got vague memories of stepping HTTP requests to increase download speed and cookies are always worth considering but I picked up a nasty cold in London and it's all a bit fuzzy.
Luckily he has previously recorded a video of a very similar (though not identical) presentation and I've embedded it below.
It's 37 minutes so for those in a hurry here are my notes from his presentation at FOWA.
The key points have also been discussed in a series of blog posts so where possible I'll link out to the relevant posts.
Importance of the backend
The first point raised is that the user perception of load time is more important than the actual load time. This means that the relevant metric is not how fast can the html document be returned to the browser but how quickly that html is rendered in the browser. In measuring how quickly a page renders it was quickly realised that the backend performance, returning the html for the page, accounted for only about 5% of the overall time it took to render the page. Even with a full cache the backend was still only 13% of overall time.
Of the top 10 sites in the US only the backend for Google accounted for more than 20% of the load time with a full cache. The Google homepage is so spartan that with a primed cache only two HTTP requests need to be made.
Cache Usage
The importance of the cache was then discussed with data presented discussing how many people at a site had a primed cache. They inserted a one pixel image on the Yahoo! homepage and then monitored the number of HTTP requests with a 200 header (empty cache) and a 304 header (primed header).
It was found that 50% of daily users have an empty cache which accounts for 20% of daily pageviews. This varies depending on the type of site, for example an empty cache accounts for fewer pageviews in a webmail site where each user will view multiple pages, but is broadly accurate. The data highlights the importance of catering for those users without a primed cache. Excessive use of images can't be justified by the assumption that once they are loaded the cached versions can be used. 50% of your users every day will be arriving at your site with an empty cache.
iFrame
Next he talked briefly about iFrames and how they can cause a 40-50 ms delay. onLoad doesn't work until the iFrame source responds which can cause a problem with 3rd party content.
YSlow
Next he discussed YSlow which grades a website based on the 14 rules developed through their research. YSlow is an extension for Firebug the popular development extension for the Firefox browser. It looks at how the page was built. Despite looking at the content rather than the response time its score correlates well with the rendering time. As such it could be a valuable tool during development to predict the speed of a site prior to its launch.
Another issue which YSlow apparently solves is a bug in how Firebug charts HTTP requests. Apparently Firebug will show queries to the cache as HTTP requests and YSlow patches this.
That's all I made notes on. I've got vague memories of stepping HTTP requests to increase download speed and cookies are always worth considering but I picked up a nasty cold in London and it's all a bit fuzzy.
Saturday, October 6. 2007
Posted by Jonathan Street
in AJAX, Misc, PHP Programming, Programming, Web Tools at
16:18
Comments (3)
Trackbacks (3)
Comments (3)
Trackbacks (3)
FOWA Shoutout
After flying back to Edinburgh after attending the Future of Web Apps conference in London Thursday night and spending Friday catching up with work it's time for a round up of what happened. There are a couple of topics I'm going to go into greater detail on in future posts but here I present to you the exhibitors, speakers, sites and ideas worthy of mention.
The conference kicked off with a keynote from Om Malik discussing 'What is the Future of Web Apps?' Mike Arrington from Techcrunch decided to gatecrash 15 minutes or so into the keynote. The conversation that followed was interesting though with the pessimism from Om working well with Mikes optimism. I've been following Techcrunch for a while but have now added GigaOm for the potentially balancing effect.
Ben Forsaith then demoed '10 Real-world apps' in 10 minutes. Surprisingly 9 of the 10 worked without a problem. The most interesting one was probably Buzzword which is a word processing app. Online office products have been getting a lot of attention recently with available anywhere functionality playing against the more basic options. Buzzwords really grabbed my attention because during the, admittedly short, demo it looked like it could wipe the floor with Microsoft Word when it came to handle images and altering the layout of the page. I frequently have to break reports into 5 or more sections to maintain the layout so if buzzwords performs as well with large files as it did during the demo then it may be goodbye Word. I haven't tried it yet but I've bookmarked it to try later.
Site Speed and User Experience
As I mentioned in the Benchmarks, Site Speed and User Experience post the first speaker of the day following the keynotes was Steve Souders discussing 'High Performance Websites'. Watch for a post discussing this in more detail later.
The quality of speakers stayed high throughout. I think on the first day the most informative/interesting speakers came at the end with Heidi Pollock discussing mobile applications which is an area I haven't previously looked at and John Resig who talked about some of the really interesting things coming up in Firefox.
On the evening of the first day Diggnation was filmed on the keynote stage in front of a packed audience. I've not watched diggnation before but it was absolutely hilarious live. I think it is only available for premium members at the moment but if you know different let me know as I would like to see what the video version was like.
Day Two
From reading the schedule I wasn't as excited by the second day as I was by the first but there was no need for worry. Simon Wardley got through 300 slides in 30 minutes with a highly engaging talk about commoditisation and utility computing. John Aizen and Eran Shir discussed the semantic web from their work at dapper. Matt Biddulph from dopplr discussed smart integration with third party sites. I'll be going into more detail on this later as well. The final session I went to was with Dick Costolo from feedburner and focused more on the business side but was interesting all the same.
Unfortunately I had to leave before the final keynotes to catch my flight but overall I felt it was a very good conference.
Expo
In addition to the conference there was also the expo hall with some interesting exhibitors.
Fav.or.it may just have what it takes to lure me away from google reader. It hasn't been officially launched yet but from what I saw during a demo it's a very interesting product. It is also built on the Zend framework which makes it worthy of note from a PHP viewpoint.
Widr sounded very promising. It's a geolocation service for the internet. It's going to potentially be more accurate than relying solely on the IP. If I understand the product correctly though I suspect it will always be a niche product as the user needs to install software for it to work. I suspect they also made a mistake in going for a .co.uk domain name rather than .com. The product has global appeal so to me a .com makes more sense.
Xcalibre launched their new flexiscale product which is probably best described as competitor for Amazon S3 and EC2. It looks like a very interesting product and from a technical perspective I suspect the better between the two but I worry that the strength of the British pound will make it less competitive on pricing.
Finally I'll highlight soup.io which is a blogging platform for less serious content. Probably best described as occupying the market between wordpress.com et al and twitter et al. It's not something I plan on using myself but it looked like a nice product which I could recommend to less web savvy family and friends.
The conference kicked off with a keynote from Om Malik discussing 'What is the Future of Web Apps?' Mike Arrington from Techcrunch decided to gatecrash 15 minutes or so into the keynote. The conversation that followed was interesting though with the pessimism from Om working well with Mikes optimism. I've been following Techcrunch for a while but have now added GigaOm for the potentially balancing effect.
Ben Forsaith then demoed '10 Real-world apps' in 10 minutes. Surprisingly 9 of the 10 worked without a problem. The most interesting one was probably Buzzword which is a word processing app. Online office products have been getting a lot of attention recently with available anywhere functionality playing against the more basic options. Buzzwords really grabbed my attention because during the, admittedly short, demo it looked like it could wipe the floor with Microsoft Word when it came to handle images and altering the layout of the page. I frequently have to break reports into 5 or more sections to maintain the layout so if buzzwords performs as well with large files as it did during the demo then it may be goodbye Word. I haven't tried it yet but I've bookmarked it to try later.
Site Speed and User Experience
As I mentioned in the Benchmarks, Site Speed and User Experience post the first speaker of the day following the keynotes was Steve Souders discussing 'High Performance Websites'. Watch for a post discussing this in more detail later.
The quality of speakers stayed high throughout. I think on the first day the most informative/interesting speakers came at the end with Heidi Pollock discussing mobile applications which is an area I haven't previously looked at and John Resig who talked about some of the really interesting things coming up in Firefox.
On the evening of the first day Diggnation was filmed on the keynote stage in front of a packed audience. I've not watched diggnation before but it was absolutely hilarious live. I think it is only available for premium members at the moment but if you know different let me know as I would like to see what the video version was like.
Day Two
From reading the schedule I wasn't as excited by the second day as I was by the first but there was no need for worry. Simon Wardley got through 300 slides in 30 minutes with a highly engaging talk about commoditisation and utility computing. John Aizen and Eran Shir discussed the semantic web from their work at dapper. Matt Biddulph from dopplr discussed smart integration with third party sites. I'll be going into more detail on this later as well. The final session I went to was with Dick Costolo from feedburner and focused more on the business side but was interesting all the same.
Unfortunately I had to leave before the final keynotes to catch my flight but overall I felt it was a very good conference.
Expo
In addition to the conference there was also the expo hall with some interesting exhibitors.
Fav.or.it may just have what it takes to lure me away from google reader. It hasn't been officially launched yet but from what I saw during a demo it's a very interesting product. It is also built on the Zend framework which makes it worthy of note from a PHP viewpoint.
Widr sounded very promising. It's a geolocation service for the internet. It's going to potentially be more accurate than relying solely on the IP. If I understand the product correctly though I suspect it will always be a niche product as the user needs to install software for it to work. I suspect they also made a mistake in going for a .co.uk domain name rather than .com. The product has global appeal so to me a .com makes more sense.
Xcalibre launched their new flexiscale product which is probably best described as competitor for Amazon S3 and EC2. It looks like a very interesting product and from a technical perspective I suspect the better between the two but I worry that the strength of the British pound will make it less competitive on pricing.
Finally I'll highlight soup.io which is a blogging platform for less serious content. Probably best described as occupying the market between wordpress.com et al and twitter et al. It's not something I plan on using myself but it looked like a nice product which I could recommend to less web savvy family and friends.
Saturday, September 29. 2007
Posted by Jonathan Street
in PHP Programming, Programming, Website Management at
16:16
Comments (4)
Trackbacks (4)
Comments (4)
Trackbacks (4)
Benchmarks, Site Speed and User Experience
Following on the back of my recent posts looking at the (hopefully) best and worst of benchmarks I thought it would be useful to finish off with some genuine tips for creating 'lightning fast' websites. I probably lack the experience and insight to bring anything new to the table though so instead I'll point you to a selection of interesting articles.
Firstly, let's lay the benchmarking issue to rest. Ben Ramsey, who after his initial outrage at my 7 tips post felt it "actually is really humorous" (probably unjustified praise but thanks anyway!), has a nice post highlighting the code in the PHP source confirming the lack of any difference I demonstrated in my follow up post. Wez Furlong commented on my 7 tips post and highlighted a post he made on benchmarking back in 2005. For anyone feeling my method was excessive his approach gives speedier results. Personally I'd like to see it run in triplicate though.
Next, as far as the minute differences the 'lightning fast PHP'-style posts are too often built around Ilia Alshanetsky probably has the best write-up.
Absolutely.
Getting to articles with tips for that 1st round of optimisations you may want to make there are 13 tips for high performance websites on the Yahoo! developer network. These were written by Steve Souders who, in addition to writing the book 'High Performance Web Sites,' is speaking at the FOWA conference next week. That's one session I definitely want to catch. Hasin Hayder has a follow up post which is definitely worth reading.
Hasin goes into more detail than the Yahoo! article and provides some sample code. A three part series of posts at the IBM developerWorks site takes a PHP focused look at high performance websites and provides some useful instructions on setting up your sites to use the XCache opcode cache, Xdebug and memcache.
Three rules for high performance web sites
For those wanting the abridged version here are my 3 tips for high performance.
1) Fast environment - Start from a position of strength. I didn't post the average speeds in the better benchmarks post because I was looking at the difference rather than the absolute values but the benchmarks were running ten times faster on my web host than on my desktop. There are various reasons why this may be the case, Linux vs Windows XP, system specs, PHP 5.2.3 vs 'evil' PHP 5.2.1, but it doesn't really matter beyond illustrating the need for a good server and host. Other things to consider include an optimizer/opcode cache and gz compression.
2) Cache everything - Database and web service queries, blocks of content and even your entire page are all fair game.
3) Test everything - Time your code. Profile your code. Test your assumptions (including tips 1 & 2).
Speed doesn't matter
Finally an alternative take because playing devils advocate is fun. Download speed is not how users determine the speed of a site. To the user a site is fast if they can quickly achieve their goal. Steven O'Grady at Red Monk also raises some interesting points contrasting the perspective of the developer and the user.
As always further suggestions, alternative viewpoints and discussion are welcome in the comments below.
Firstly, let's lay the benchmarking issue to rest. Ben Ramsey, who after his initial outrage at my 7 tips post felt it "actually is really humorous" (probably unjustified praise but thanks anyway!), has a nice post highlighting the code in the PHP source confirming the lack of any difference I demonstrated in my follow up post. Wez Furlong commented on my 7 tips post and highlighted a post he made on benchmarking back in 2005. For anyone feeling my method was excessive his approach gives speedier results. Personally I'd like to see it run in triplicate though.
Next, as far as the minute differences the 'lightning fast PHP'-style posts are too often built around Ilia Alshanetsky probably has the best write-up.
Please keep in mind that these are not the 1st optimization you should perform. There are some far easier and more performance advantageous tricks, however once those are exhausted and you don't feel like turning to C, these maybe tricks you would want to consider.
Absolutely.
Getting to articles with tips for that 1st round of optimisations you may want to make there are 13 tips for high performance websites on the Yahoo! developer network. These were written by Steve Souders who, in addition to writing the book 'High Performance Web Sites,' is speaking at the FOWA conference next week. That's one session I definitely want to catch. Hasin Hayder has a follow up post which is definitely worth reading.
Hasin goes into more detail than the Yahoo! article and provides some sample code. A three part series of posts at the IBM developerWorks site takes a PHP focused look at high performance websites and provides some useful instructions on setting up your sites to use the XCache opcode cache, Xdebug and memcache.
Three rules for high performance web sites
For those wanting the abridged version here are my 3 tips for high performance.
1) Fast environment - Start from a position of strength. I didn't post the average speeds in the better benchmarks post because I was looking at the difference rather than the absolute values but the benchmarks were running ten times faster on my web host than on my desktop. There are various reasons why this may be the case, Linux vs Windows XP, system specs, PHP 5.2.3 vs 'evil' PHP 5.2.1, but it doesn't really matter beyond illustrating the need for a good server and host. Other things to consider include an optimizer/opcode cache and gz compression.
2) Cache everything - Database and web service queries, blocks of content and even your entire page are all fair game.
3) Test everything - Time your code. Profile your code. Test your assumptions (including tips 1 & 2).
Speed doesn't matter
Finally an alternative take because playing devils advocate is fun. Download speed is not how users determine the speed of a site. To the user a site is fast if they can quickly achieve their goal. Steven O'Grady at Red Monk also raises some interesting points contrasting the perspective of the developer and the user.
As always further suggestions, alternative viewpoints and discussion are welcome in the comments below.

