How to Fetch Page Content Using PHP

The web has so many useful services now, many offering their own API that you can draw data from and create your own mashups with. Websites like Twitter, Facebook, Flickr, and Kongregate are just a few.

Their data output is most commonly formatted in RSS (for feed readers), and JSON (a lighter form than XML used a lot for AJAX). In this post I’ll show you how to use the cURL library to collect data from any public web page.

First, I’ve set up a function that will take in the URL you want to use as a parameter and returns the content of the page. This will make it easier later on, particularly if you want to call more than one page in a single script (like feeding Twitter and Facebook status updates into one stream).

Making a Connection

1
2
3
4
5
6
7
8
9
10
11
12
13
function get_page_content($url){
  $resource = curl_init();
 
  curl_setopt($resource, CURLOPT_URL, $url);
  curl_setopt($resource, CURLOPT_HEADER, false);
  curl_setopt($resource, CURLOPT_RETURNTRANSFER, true);
 
  $content = curl_exec($resource);
 
  curl_close($resource);
 
  return $content;
}

Line 2: Initialises a cURL connection so we can start using it.

Line 4: Sets the URL.
Line 5: Sets headers to false, so that we don’t collect extra header data that we won’t use.
Line 6: Sets the return transfer to true. Without setting this, the script will output the content directly to the page, which we don’t want.

Line 8: Execute the connection using the options from lines 4-6 and store it into a holding variable. If we hadn’t set our return transfer to true, this would return empty.

Lines 10-12: Close connection to the URL and return the holding variable.

So, for example, we can collect JSON formatted data from a user’s latest Twitter updates:

Decoding the Data

1
2
3
4
$twitter_content = get_page_content(
  "http://api.twitter.com/1/statuses/user_timeline.json?screen_name=_bigSteve");
 
$tweets = json_decode($twitter_content, true);

Where json_decode with its second parameter as true puts our data into an associative array.

We can then display this data as we usually would, looping through the elements:

Displaying the Text

1
2
3
4
5
6
7
8
9
10
11
echo "<ol>";
 
foreach($tweets as $tweet){
 
  echo "<li>".
    htmlspecialchars($tweet['text'], ENT_COMPAT, 'UTF-8').
    "</li>\n";
 
}
 
echo "</ol>";

The htmlspecialchars method in there is very useful for escaping special characters into their HTML entities, so should be used here since we are sending output to the web page.

After the text we want to output, I’ve found the other two parameters to be useful in most cases. You can change these if you get some strange characters.

So when you run the script on your own website, it will display an ordered list of Twitter updates onto the page. You can take the info above and convert it to use data from other APIs, using the get_page_content method we created in particular.

If you’re looking to parse XML documents like RSS feeds, then SimpleXML is a great library for that. When given the option though, it’s almost always best to go with JSON, so have a poke around for it in the API docs before you set off.


3 thoughts on “How to Fetch Page Content Using PHP

  1. Really good article Steve!
    Another thing that may be of note is the the core php function file_get_contents is a much simpler function to use, however, it is a great deal slower. Your function allows people to use the much more efficient and powerful cURL Library with ease. The only thing I would consider is if the request times out, the function returned false to allow people to evaluate the request quicker, otherwise Good work!

    1. Yes, for that we can use CURLOPT_CONNECTTIMEOUT as an option and set our seconds. We could then add a parameter to the function to set this per connection.

      Thanks Gavin, great comment.

Comments are closed.