Thursday, May 1. 2008
Why don't geoip services accept feedback?
I've recently been playing around with geoip databases looking at implementing the type of timezone prediction I previously discussed. I'll be writing a blog post to cover that in the next week or so but first wanted to mention something which has been puzzling me.
Why don't geoip services accept feedback?
A good example of accepting feedback is Akismet. There you can report wrongly missed messages as spam and messages incorrectly labelled as spam as ham. I couldn't find an official statement on their accuracy but figures as high as 99.9% are mentioned. Defensio, a direct competitor, gives an accuracy rate of 99.77%. With my own experience using akismet an error rate of 1 in every 1000 seems about right.
In contrast, maxmind who is, to my knowledge, the commercial leader in geoip, states their accuracy as,
I don't have enough information to query their accuracy in the US but for the rest of the world it falls significantly from personal experience.
With feedback from their users I think this accuracy could be substantially increased.
Hostip.info
Hostip.info, which makes their database freely available does accept feedback from users. It's a simple process with only three screens and a CAPTCHA to get past. Oh, you also need cookies enabled. Not many people are going to use this. What is really needed is an automated process and this system is set up to prevent automated processes.
My 2 cents
Geoip services have the potential to more efficiently utilise their user communities than even the anti-spam services can. Internet users regularly submit their location on forums, social networks and whenever they make a purchase. These are all situations where predictions based on geoip lookups can create a more pleasant user experience. If the user doesn't change the predicted location then all is well. If the user does change the location though then that information should be sent back to the geoip service so the database can be updated.
Analysis of hostnames will only get you so far. To really boost the accuracy you need human reviewers. If you can convert ordinary internet users into reviewers without inconveniencing them then that's even better.
Why don't geoip services accept feedback?
A good example of accepting feedback is Akismet. There you can report wrongly missed messages as spam and messages incorrectly labelled as spam as ham. I couldn't find an official statement on their accuracy but figures as high as 99.9% are mentioned. Defensio, a direct competitor, gives an accuracy rate of 99.77%. With my own experience using akismet an error rate of 1 in every 1000 seems about right.
In contrast, maxmind who is, to my knowledge, the commercial leader in geoip, states their accuracy as,
Over 99% accurate on a country level, 90% accurate on a state level, 81% accurate for the US within a 25 mile radius.
I don't have enough information to query their accuracy in the US but for the rest of the world it falls significantly from personal experience.
With feedback from their users I think this accuracy could be substantially increased.
Hostip.info
Hostip.info, which makes their database freely available does accept feedback from users. It's a simple process with only three screens and a CAPTCHA to get past. Oh, you also need cookies enabled. Not many people are going to use this. What is really needed is an automated process and this system is set up to prevent automated processes.
My 2 cents
Geoip services have the potential to more efficiently utilise their user communities than even the anti-spam services can. Internet users regularly submit their location on forums, social networks and whenever they make a purchase. These are all situations where predictions based on geoip lookups can create a more pleasant user experience. If the user doesn't change the predicted location then all is well. If the user does change the location though then that information should be sent back to the geoip service so the database can be updated.
Analysis of hostnames will only get you so far. To really boost the accuracy you need human reviewers. If you can convert ordinary internet users into reviewers without inconveniencing them then that's even better.
Wednesday, March 5. 2008
Posted by Jonathan Street
in Misc, Programming, Web Tools, Website Management, Website Promotion at
14:48
BarCampScotland2008 Roundup
Over a full month after the fact I present my summary of the BarcampScotland2008 event.
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
The event was split over two days. It kicked off on the Friday evening (1st Feb) in the main room of Alison House at the School of Architecture. It was just the one room with a decent communal feeling. I think most people were holding back with their presentations for the following day. Despite this two presentations did take place. James Littlejohn made the first presentation after the welcome session and talked about data portability.
It was a good summary of the current situation. I also happen to agree with most of his positions. He has taken the decision of making his homepage the hub of his social network. Although I think he has perhaps taken things a little too far my main criticism is the implementation. Looking at the site it took me about 20 seconds to figure out that aboyne is where he is based and not his surname. I couldn't find his surname anywhere on his homepage. After navigating around for a while I found it in the byline for his blog. It wasn't an easy process. I highly doubt that data was machine readable despite the importance he attached to this during his talk.
Ewan Spence was next up with an improvised talk he largely made up on the spot. This rapidly migrated to a conversation with some interesting points raised.
Following on from this Dave McClure set a small competition going. A excessive number of random words were gathered from the audience which then broke up into 5 groups to brainstorm company ideas around any pair of words. Somehow the team I was in won with sexydyslexia.com, a couple that takes standard prose and converts it into netspeak and vice-versa. I notice that the domain name is still available so although rated as the best apparently no one in the audience wanted to run with it.
All the details on the second day after the jump . . .
Continue reading "BarCampScotland2008 Roundup"
Saturday, October 6. 2007
Posted by Jonathan Street
in AJAX, Misc, PHP Programming, Programming, Web Tools at
11:18
Comments (3)
Trackbacks (3)
Comments (3)
Trackbacks (3)
FOWA Shoutout
After flying back to Edinburgh after attending the Future of Web Apps conference in London Thursday night and spending Friday catching up with work it's time for a round up of what happened. There are a couple of topics I'm going to go into greater detail on in future posts but here I present to you the exhibitors, speakers, sites and ideas worthy of mention.
The conference kicked off with a keynote from Om Malik discussing 'What is the Future of Web Apps?' Mike Arrington from Techcrunch decided to gatecrash 15 minutes or so into the keynote. The conversation that followed was interesting though with the pessimism from Om working well with Mikes optimism. I've been following Techcrunch for a while but have now added GigaOm for the potentially balancing effect.
Ben Forsaith then demoed '10 Real-world apps' in 10 minutes. Surprisingly 9 of the 10 worked without a problem. The most interesting one was probably Buzzword which is a word processing app. Online office products have been getting a lot of attention recently with available anywhere functionality playing against the more basic options. Buzzwords really grabbed my attention because during the, admittedly short, demo it looked like it could wipe the floor with Microsoft Word when it came to handle images and altering the layout of the page. I frequently have to break reports into 5 or more sections to maintain the layout so if buzzwords performs as well with large files as it did during the demo then it may be goodbye Word. I haven't tried it yet but I've bookmarked it to try later.
Site Speed and User Experience
As I mentioned in the Benchmarks, Site Speed and User Experience post the first speaker of the day following the keynotes was Steve Souders discussing 'High Performance Websites'. Watch for a post discussing this in more detail later.
The quality of speakers stayed high throughout. I think on the first day the most informative/interesting speakers came at the end with Heidi Pollock discussing mobile applications which is an area I haven't previously looked at and John Resig who talked about some of the really interesting things coming up in Firefox.
On the evening of the first day Diggnation was filmed on the keynote stage in front of a packed audience. I've not watched diggnation before but it was absolutely hilarious live. I think it is only available for premium members at the moment but if you know different let me know as I would like to see what the video version was like.
Day Two
From reading the schedule I wasn't as excited by the second day as I was by the first but there was no need for worry. Simon Wardley got through 300 slides in 30 minutes with a highly engaging talk about commoditisation and utility computing. John Aizen and Eran Shir discussed the semantic web from their work at dapper. Matt Biddulph from dopplr discussed smart integration with third party sites. I'll be going into more detail on this later as well. The final session I went to was with Dick Costolo from feedburner and focused more on the business side but was interesting all the same.
Unfortunately I had to leave before the final keynotes to catch my flight but overall I felt it was a very good conference.
Expo
In addition to the conference there was also the expo hall with some interesting exhibitors.
Fav.or.it may just have what it takes to lure me away from google reader. It hasn't been officially launched yet but from what I saw during a demo it's a very interesting product. It is also built on the Zend framework which makes it worthy of note from a PHP viewpoint.
Widr sounded very promising. It's a geolocation service for the internet. It's going to potentially be more accurate than relying solely on the IP. If I understand the product correctly though I suspect it will always be a niche product as the user needs to install software for it to work. I suspect they also made a mistake in going for a .co.uk domain name rather than .com. The product has global appeal so to me a .com makes more sense.
Xcalibre launched their new flexiscale product which is probably best described as competitor for Amazon S3 and EC2. It looks like a very interesting product and from a technical perspective I suspect the better between the two but I worry that the strength of the British pound will make it less competitive on pricing.
Finally I'll highlight soup.io which is a blogging platform for less serious content. Probably best described as occupying the market between wordpress.com et al and twitter et al. It's not something I plan on using myself but it looked like a nice product which I could recommend to less web savvy family and friends.
The conference kicked off with a keynote from Om Malik discussing 'What is the Future of Web Apps?' Mike Arrington from Techcrunch decided to gatecrash 15 minutes or so into the keynote. The conversation that followed was interesting though with the pessimism from Om working well with Mikes optimism. I've been following Techcrunch for a while but have now added GigaOm for the potentially balancing effect.
Ben Forsaith then demoed '10 Real-world apps' in 10 minutes. Surprisingly 9 of the 10 worked without a problem. The most interesting one was probably Buzzword which is a word processing app. Online office products have been getting a lot of attention recently with available anywhere functionality playing against the more basic options. Buzzwords really grabbed my attention because during the, admittedly short, demo it looked like it could wipe the floor with Microsoft Word when it came to handle images and altering the layout of the page. I frequently have to break reports into 5 or more sections to maintain the layout so if buzzwords performs as well with large files as it did during the demo then it may be goodbye Word. I haven't tried it yet but I've bookmarked it to try later.
Site Speed and User Experience
As I mentioned in the Benchmarks, Site Speed and User Experience post the first speaker of the day following the keynotes was Steve Souders discussing 'High Performance Websites'. Watch for a post discussing this in more detail later.
The quality of speakers stayed high throughout. I think on the first day the most informative/interesting speakers came at the end with Heidi Pollock discussing mobile applications which is an area I haven't previously looked at and John Resig who talked about some of the really interesting things coming up in Firefox.
On the evening of the first day Diggnation was filmed on the keynote stage in front of a packed audience. I've not watched diggnation before but it was absolutely hilarious live. I think it is only available for premium members at the moment but if you know different let me know as I would like to see what the video version was like.
Day Two
From reading the schedule I wasn't as excited by the second day as I was by the first but there was no need for worry. Simon Wardley got through 300 slides in 30 minutes with a highly engaging talk about commoditisation and utility computing. John Aizen and Eran Shir discussed the semantic web from their work at dapper. Matt Biddulph from dopplr discussed smart integration with third party sites. I'll be going into more detail on this later as well. The final session I went to was with Dick Costolo from feedburner and focused more on the business side but was interesting all the same.
Unfortunately I had to leave before the final keynotes to catch my flight but overall I felt it was a very good conference.
Expo
In addition to the conference there was also the expo hall with some interesting exhibitors.
Fav.or.it may just have what it takes to lure me away from google reader. It hasn't been officially launched yet but from what I saw during a demo it's a very interesting product. It is also built on the Zend framework which makes it worthy of note from a PHP viewpoint.
Widr sounded very promising. It's a geolocation service for the internet. It's going to potentially be more accurate than relying solely on the IP. If I understand the product correctly though I suspect it will always be a niche product as the user needs to install software for it to work. I suspect they also made a mistake in going for a .co.uk domain name rather than .com. The product has global appeal so to me a .com makes more sense.
Xcalibre launched their new flexiscale product which is probably best described as competitor for Amazon S3 and EC2. It looks like a very interesting product and from a technical perspective I suspect the better between the two but I worry that the strength of the British pound will make it less competitive on pricing.
Finally I'll highlight soup.io which is a blogging platform for less serious content. Probably best described as occupying the market between wordpress.com et al and twitter et al. It's not something I plan on using myself but it looked like a nice product which I could recommend to less web savvy family and friends.
Saturday, July 21. 2007
Xenu : Stats aggregation for any site
I've previously mentioned popuri.us as one of the better examples of a website stats aggregator but I think my loyalty is switching to Xenu.
It was initially flagged by techcrunch about a week ago. At the time it was struggling under the onslaught of being highlighted both on techcrunch and elsewhere. A few days later seomoz brought to my attention that the load had got so bad that the creator felt unable to cope and had released the source code to the community. There are now 13 mirrors you can use including Italian, German, Bulgarian, French and Dutch versions.
Source Code
The backend is all PHP while the frontend relies heavily on javascript. The source code is an interesting, though far from easy, read. Some of the functionality is in my opinion needlessly excessive. I really don't see the point of being able to drag the stats boxes around the screen for instance. The source code would also benefit from descriptive filenames. Principally in the results folder where stats are returned by 46 files cunningly named 1-46.
Despite this there are some real insights to be had for anyone interested in scraping these sorts of stats. For instance a stunningly simple method to grab the alexa rank is used which I hadn't come across before. It doesn't involve paying to access the API and you don't need to wrestle a css file into submission to extract the rank from the alexa site.
The Spoiler
The alexa data is returned in file number 7. Just in case you're not overly thrilled by the notion of opening all 46 files to find the one that accesses the service you're particularly interested in there is a useful key in the following file - js/general_without_encryption.js
It was initially flagged by techcrunch about a week ago. At the time it was struggling under the onslaught of being highlighted both on techcrunch and elsewhere. A few days later seomoz brought to my attention that the load had got so bad that the creator felt unable to cope and had released the source code to the community. There are now 13 mirrors you can use including Italian, German, Bulgarian, French and Dutch versions.
Source Code
The backend is all PHP while the frontend relies heavily on javascript. The source code is an interesting, though far from easy, read. Some of the functionality is in my opinion needlessly excessive. I really don't see the point of being able to drag the stats boxes around the screen for instance. The source code would also benefit from descriptive filenames. Principally in the results folder where stats are returned by 46 files cunningly named 1-46.
Despite this there are some real insights to be had for anyone interested in scraping these sorts of stats. For instance a stunningly simple method to grab the alexa rank is used which I hadn't come across before. It doesn't involve paying to access the API and you don't need to wrestle a css file into submission to extract the rank from the alexa site.
The Spoiler
The alexa data is returned in file number 7. Just in case you're not overly thrilled by the notion of opening all 46 files to find the one that accesses the service you're particularly interested in there is a useful key in the following file - js/general_without_encryption.js
Sunday, July 8. 2007
Web Scraping at Seomoz
Scraping content from the web is a real pain. In fact, at the moment, I can't think of any programming tasks I like less. I'm sure they exist I just can't think of them - feel free to remind me in the comments.
I'm setting up a tool at the moment that is going to use a fair amount of web scraping. Sadly not all sites are kind enough to provide an API like Compete and some actively work to make your life more difficult. The three key points that I feel make web scraping a real pain are:
With my recent work on web scraping I found the latest 'Whiteboard Friday' video from seomoz to be exceptionally timely. I generally read the seomoz blog for the discussion on marketing issues but they also have a couple of tools which are heavily reliant on web scraping and on Friday the issues surrounding this were discussed. Apparently they've been having trouble providing data for all their visitors due to the ever increasing demand. As well as the video Matt, who is their CTO and web developer, goes into a fair amount of detail on their process and some of it really depressed and/or surprised me
Sadly this isn't the first time I've heard this. The costs aren't unreasonable but if you're paying for something I think it is fair to expect I higher standard of service than if it was free.
7 attempts strikes me as amazingly high. It would be interesting to know the delay between each request.
Experience so far
I've got two scripts/tools running at the moment that require data from external sources.
The first is the msn contact grabber script which can fail after half a dozen or fewer requests on a single IP address. To make matters worse the interface is complex and poorly documented. Thankfully it is stable.
The second is the dnsbl checker which I haven't had fail on me yet. It utilises the dns system and is designed for high use. Even if the tool did become insanely popular the demand placed on external services could be limited by zone transfers. The interface is also so simple that documentation really isn't needed.
My experience seems to be at the two extremes of scraping. I'm hoping my current work will be more dnsbl checker than msn contact grabber. Maybe I should just drop Alexa data?
I'm setting up a tool at the moment that is going to use a fair amount of web scraping. Sadly not all sites are kind enough to provide an API like Compete and some actively work to make your life more difficult. The three key points that I feel make web scraping a real pain are:
- Undocumented interface
- We're dealing with essentially unstructured HTML here. Web scraping also highlights just how little many companies care about validation and writing readable code. I'm assuming their code is a pain to read because they compact their files to save on bandwidth. If not then maintaining these sites could be an even bigger pain then scraping them.
- No warning of changes
- You're essentially at the mercy of the designer. If they change their layout your code breaks.
- Timeouts and blank pages
- Even with all the code in place you still may got no result. The site may be down or if you're being too aggressive your requests may be throttled. Even if you do get to a page it may nto be the one you want. Does a blank page mean the site has no info for the query you made or does it mean there was some sort of error? If the site is using 'soft' error messages it may be difficult to know.
With my recent work on web scraping I found the latest 'Whiteboard Friday' video from seomoz to be exceptionally timely. I generally read the seomoz blog for the discussion on marketing issues but they also have a couple of tools which are heavily reliant on web scraping and on Friday the issues surrounding this were discussed. Apparently they've been having trouble providing data for all their visitors due to the ever increasing demand. As well as the video Matt, who is their CTO and web developer, goes into a fair amount of detail on their process and some of it really depressed and/or surprised me
One of the most unreliable APIs I've had to deal with is the Alexa / Amazon API, which is funny because it's the only one that costs money.
Sadly this isn't the first time I've heard this. The costs aren't unreasonable but if you're paying for something I think it is fair to expect I higher standard of service than if it was free.
This entire [data fetching] process is repeated between 2-7 times with varying timeout lengths and user-agents until some kind of data is fetched.
7 attempts strikes me as amazingly high. It would be interesting to know the delay between each request.
Experience so far
I've got two scripts/tools running at the moment that require data from external sources.
The first is the msn contact grabber script which can fail after half a dozen or fewer requests on a single IP address. To make matters worse the interface is complex and poorly documented. Thankfully it is stable.
The second is the dnsbl checker which I haven't had fail on me yet. It utilises the dns system and is designed for high use. Even if the tool did become insanely popular the demand placed on external services could be limited by zone transfers. The interface is also so simple that documentation really isn't needed.
My experience seems to be at the two extremes of scraping. I'm hoping my current work will be more dnsbl checker than msn contact grabber. Maybe I should just drop Alexa data?
Wednesday, June 20. 2007
Release Early, Release Often - Yeah Right . . .
For a couple of weeks now I've been intending to release two new tools for this site. For one week they've been live on the site (though not linked to) and still I haven't mentioned them.
It could be said I'm procrastinating. This is a shame because as updates go this is even more significant than the dedicated pages for the scripts I've put together.
css/js compaction
The first tool is a port of the original js/css compaction tool and represents the penultimate migration of code from the old domain. All that is left now is the msn contacts web service which I currently have no plans to move across. This tool is basic and if you're working with several javascript or css files and making modifications frequently then there are probably better tools available. For the majority of people, who don't alter their javascript or css files very often and just want to save a little bandwidth or reduce their page loading times, then this tool could be ideal.
Check your spam status
The second tool is the most interesting and certainly required the most work. It's taking the ideas developed in these two posts about checking your rbl/spam status and automating the process to their logical conclusion. This second tool allows scheduled checking and automated updates of any change. At present it checks 75 real-time blacklists of varying degrees of importance. Furthermore it will check these lists up to once a week on your behalf.
It's a simple tool but it's easy to use and should be a big time saver. Expect more of the same . . . soon.
p.s. - I already have some ideas for additional tools but if you are having trouble finding a tool you think will make your life simpler then get in touch as I may be interested in building it as a part of this site. Easiest way to contact me is via a comment. If you don't want it to be public just state you want it to remain private in the message.
It could be said I'm procrastinating. This is a shame because as updates go this is even more significant than the dedicated pages for the scripts I've put together.
css/js compaction
The first tool is a port of the original js/css compaction tool and represents the penultimate migration of code from the old domain. All that is left now is the msn contacts web service which I currently have no plans to move across. This tool is basic and if you're working with several javascript or css files and making modifications frequently then there are probably better tools available. For the majority of people, who don't alter their javascript or css files very often and just want to save a little bandwidth or reduce their page loading times, then this tool could be ideal.
Check your spam status
The second tool is the most interesting and certainly required the most work. It's taking the ideas developed in these two posts about checking your rbl/spam status and automating the process to their logical conclusion. This second tool allows scheduled checking and automated updates of any change. At present it checks 75 real-time blacklists of varying degrees of importance. Furthermore it will check these lists up to once a week on your behalf.
It's a simple tool but it's easy to use and should be a big time saver. Expect more of the same . . . soon.
p.s. - I already have some ideas for additional tools but if you are having trouble finding a tool you think will make your life simpler then get in touch as I may be interested in building it as a part of this site. Easiest way to contact me is via a comment. If you don't want it to be public just state you want it to remain private in the message.
Saturday, April 7. 2007
Popuri.us : Stats aggregation for any site
I came across popuri.us a while ago and have been holding off posting about it for a while because it seemed a little unstable after being highlighted by TechCrunch.
It reminds me a lot of the page strength tool at seomoz. The metrics reported aren't all the same and you don't get an overall score with popuri.us as you do with seomoz but it is another very good tool for gathering info on a url.
Integrating popularity ranks other than Alexa is a good move for popuri.us. Alexa is known to be biased towards the webmaster type crowd so looking at other such metrics is probably a good move. For sites with an rss feed it also fetches the number of bloglines subscribers. It would be nice to see data being collected from a few other online feed readers but overall this is a really nice tool. It has certainly earned a place in my bookmarks.
It reminds me a lot of the page strength tool at seomoz. The metrics reported aren't all the same and you don't get an overall score with popuri.us as you do with seomoz but it is another very good tool for gathering info on a url.
Integrating popularity ranks other than Alexa is a good move for popuri.us. Alexa is known to be biased towards the webmaster type crowd so looking at other such metrics is probably a good move. For sites with an rss feed it also fetches the number of bloglines subscribers. It would be nice to see data being collected from a few other online feed readers but overall this is a really nice tool. It has certainly earned a place in my bookmarks.
Saturday, February 3. 2007
Is my MSNM contacts script obsolete?
I keep a fairly close eye on my server logs. It's a boring process but undoubtedly worth it. You never know what you might find.
I've recently got a few visitors from mytton.net, the website for David Mytton. It turns out that he has recently been looking at implementing some contact importing functionality on a project he is working on. Continue reading "Is my MSNM contacts script obsolete?"
I've recently got a few visitors from mytton.net, the website for David Mytton. It turns out that he has recently been looking at implementing some contact importing functionality on a project he is working on. Continue reading "Is my MSNM contacts script obsolete?"
Monday, November 27. 2006
Source code for the contacts web service
There have been a couple of requests for the source code to the msn contacts grab web service so that anyone interested can run the web service on one server they have access to while querying it from other servers they use. This sounds like a good idea to me so here are the details you need.
Continue reading "Source code for the contacts web service"
Continue reading "Source code for the contacts web service"
Saturday, November 11. 2006
'Inbound links by page' tool posted by DP member
There was a good post by 'mass nerder' on the digitalpoint forums on Wednesday highlighting a new tool he had just created. The idea is that it breaks down the inbound links to a site based on the page they are directed at.
It's quite a useful tool to identify which pages on your site have received the most attention. It can also be used to analyse the linking strategy of your competitors.
It is still a little rough at the edges just now (there are still some errors being returned) but I think the tool has a lot of potential. A more refined analysis capability would be helpful but it may be added once the bugs are worked out.
The thread is here and the tool can be tried here.
It's quite a useful tool to identify which pages on your site have received the most attention. It can also be used to analyse the linking strategy of your competitors.
It is still a little rough at the edges just now (there are still some errors being returned) but I think the tool has a lot of potential. A more refined analysis capability would be helpful but it may be added once the bugs are worked out.
The thread is here and the tool can be tried here.
Saturday, October 28. 2006
Page strength vs google pagerank
I've just recently come across an interesting tool available at seomoz. The page strength tool is designed to give a more thorough analysis of the importance of a page than the google pagerank.
It looks at a host of factors, including the google pagerank, to give a more detailed and more up-to-date analysis of a page. It takes a bit longer to gather the data (there are more sources to query) but I think it is a more useful tool than just looking at the google pagerank.
It looks at a host of factors, including the google pagerank, to give a more detailed and more up-to-date analysis of a page. It takes a bit longer to gather the data (there are more sources to query) but I think it is a more useful tool than just looking at the google pagerank.
Saturday, October 7. 2006
Development on the MSNM contacts grabber
I seem to be getting a lot of visitors to my post presenting a script for grabbing an MSN messenger contact list. There have also been plenty of comments which is again good news. They have however brought up a few issues.
Continue reading "Development on the MSNM contacts grabber"
Tuesday, August 22. 2006
Webthumb rendering engine released
I first heard about Webthumb a few weeks ago and thought at the time what a brilliant tool it was. Since then I've been looking forward to playing with the rendering engine it uses. Thankfully it has now been released which I'm pleased about. Webthumb basically takes a snapshot of a site and allows you to download the image as a jpeg. Useful if you want to comment on the design of a site or save an image for future reference. It could also be a useful feature for link directories and the like.
Having said that the web interface is not the best approach to use if you intend to take a screenshot of every site in even a moderately sized link directory. The release of the rendering engine is therefore a welcome development. Released so far is just the rendering engine, not the code that takes the produced image and displays it on the website. If I remember correctly he is not intending to release that code but I can't find where he says that so I may be wrong.
Having said that the web interface is not the best approach to use if you intend to take a screenshot of every site in even a moderately sized link directory. The release of the rendering engine is therefore a welcome development. Released so far is just the rendering engine, not the code that takes the produced image and displays it on the website. If I remember correctly he is not intending to release that code but I can't find where he says that so I may be wrong.
Monday, July 24. 2006
File compaction
Over the past few days I have been thinking about how to squeeze the most out of our bandwidth. Whether you use a shared hosting account or you have a dedicated server you'll be paying for the bandwidth you use so keeping your usage to a minimum seems like a no-brainer as long as it doesn't affect the service you offer. I've come up with a few ways of doing just this.
Continue reading "File compaction"

