I quite like this article posted at
Devpapers looking at displaying the google pagerank for a site. The only problem is that I find it quite limited.
The current class would be fine for a small site with not much traffic but if you want to use the script on a larger site you are going to be connecting to the google servers thousands of times each day. There are two problems with this, firstly google is unlikely to be happy and secondly your pages are going to load more slowly than they could.
Rather than connecting to the google servers every time someone loads a page we can get the pagerank from google once each day (or even each week as pagerank doesn’t change all that often) and store the value in a database or a file. Then when someone visits the page the value is retrieved from the database or file rather than google. Essentially we are caching the data, accessing a locally stored version which is much faster than connecting to the google servers.
The purpose of this re-write is to introduce caching and separate getting the pagerank value from displaying it. I am not going to look at storing the value in a database because depending on the project the table structure and interface to the database will be different.
function query_google
($url){ $ch =
"6".
$this->
GoogleCH($this->
strord("info:" .
$url));
$fp =
fsockopen("www.google.com",
80,
$errno,
$errstr,
30);
if (!
$fp) { echo "$errstr ($errno)<br />\n";
} else { $out =
"GET /search?client=navclient-auto&ch=" .
$ch .
"&features=Rank&q=info:" .
$url .
" HTTP/1.1\r\n" ;
$out .=
"Host: www.google.com\r\n" ;
$out .=
"Connection: Close\r\n\r\n" ;
fwrite($fp,
$out);
while (!
feof($fp)) { $data =
fgets($fp,
128);
$pos =
strpos($data,
"Rank_");
if($pos ===
false){ }else{ $pagerank =
substr($data,
$pos +
9);
$this->
pr_val =
$pagerank;
} } fclose($fp);
} }
Ignoring a few additional variables at the start of the class the first major variation is in the printrank function which I have renamed ´query_google´. The only difference in this function is where instead of passing a value to pr_image to output the image code I instead store the value in a variable.
function get_rank
($url) { // Include the package require_once('Cache/Lite.php');
// Set a few options $options =
array( 'cacheDir' =>
$this->
cache_dir,
'lifeTime' =>
$this->
cache_time );
// Create a Cache_Lite object $Cache_Lite =
new Cache_Lite
($options);
//Check the cache here // Test if thereis a valide cache for this id if ($data =
$Cache_Lite->
get($url)) { // Cache hit ! $this->
pr_val =
$data;
} else { // No valid cache found (you have to make the page) // Cache miss ! // Put in $data datas to put in cache //Actual code to be run $this->
query_google($url);
$Cache_Lite->
save($this->
pr_val);
} }
I have deleted the ´get_pr´ function and added in some additional functions. The first additional function is ´get_rank´. This function handles the caching. Firstly I include the Cache_Lite package. I have the path to the PEAR packages in the include_path value for my server so I can easily include the package here. Having included the package I then set the variables for the cache directory and the lifetime of the cached files ready to create a new instance of the class. Once that is created I use an ´if´ statement to check whether the pagerank value has already been stored in the cache. If it hasn´t been stored in the cache or the value is too old I call the ´query_google´ function to get the value from google and then save the value in the cache ready for the next time I need it.
//Functions to actually be used by the user
//Get a numerical value for the pagerank
function Qpr_num($url) {
//Query google server
$this->get_rank($url);
//Return the numerical value
return $this->pr_val;
}
//Ret the html code to display an image
function Qpr_img($url) {
//Query google server
$this->get_rank($url);
//Get image code
$this->pr_image($this->pr_val);
//Return the image code
return $this->pr;
}
The last two functions simply pull everything together and simplify the process of either getting a value for the pagerank of a site or html code to display an image.
<?PHP//Get the pagerank checking classinclude("pagerank.class.php");
//Get pagerank for site$gpr =
new pageRank
();
//You can easily modify the cache folder at runtime//$gpr->cache_dir = "new_folder";//A similar method works for the cache lifetime//86400 is 1 day//604800 is 1 week - This is the default//2419200 is 4 weeks//$gpr->cache_time = 86400;//Get a value for the pagerank of a page $gpr_val =
$gpr->
Qpr_num("http://jmstreet.info");
//Get html code to display an image$gpr_img =
$gpr->
Qpr_img("http://jmstreet.info");
echo "Google pagerank is " .
$gpr_val;
echo $gpr_img;
?>
Usage is simple enough but it is important to ensure that you can include files from a PEAR installation in your scripts and that you have a folder set up to store the cache files. If necessary you can change the name of the folder where you store the cache files either in the class file or by modifying the variable during usage. You will need to make sure you have a copy of Cache_Lite in your PEAR installation.
The script can be
downloaded here.
After writing the article looking at fetching the google pagerank I realised that I never did add any detail to my assertion that caching the value is quicker than querying the google site every time you want to find out what the pagerank for a site is.
Tracked: Oct 01, 12:11
I rank well for the weirdest things. Recently I've been trying out some web analytics tools and I thought I would share some of my discoveries. I'm mainly going to talk about the search queries I rank well for. If you are interested then read on, if
Tracked: Dec 01, 15:46
On a recent project I wanted to cache data and remembering how easy Cache_Lite was to use in my recent article on displaying a google pagerank I decided to use it again. Unfortunately I ran into a problem, it seemed to only allow you to cache strings and
Tracked: Jan 07, 16:26