Site Search using MSN API and PHP

September 2006

If you’re interested in having a site search on your web site, you can create one yourself, use one of the free services, or better yet use one of the existing search engines. The big three (Google, MSN, Yahoo) each released API’s so you can query them, and then present the results any way you wish. This article presents a PHP class which queries MSN Search, and returns the result as an array.

Quick Start

Warning!

You should *never* try out code on your production server. Always have a test or development box for trying out new code. Experimenting on your live server is asking for trouble!

If you’re in a hurry (which you shouldn’t be), just grab the file with the code. You’ll need a MSN API Key before continuing. After getting your key, open up /search-msn.php and look for the line that reads
$msnsearch = new MSNSearch('INSERTAPIKEYHERE');.
Put your API key in, save the file, and place the files in the following locations on your test web server. The additional files in the archive aren’t needed — they’ll be explained later.

/search-msn.php
/include/MSNSearch.php
/include/nusoapx.php

That’s it! Navigate to /search-msn.php and you should get a functioning web search. Of course, you should (must?) modify some of the setup for your own site, so if you didn’t get the results you expect, keep reading.

SOAP and possible problems

SOAP is “a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment (w3c.org SOAP docs)”. In short, it’s a way for you to call a service on a remote machine, and get the results back in an easy to use format. PHP doesn’t support SOAP (at least until PHP5), but a library is freely available called NuSOAP, available at http://sourceforge.net/projects/nusoap/.

The procedure is simple, if not a little cryptic. Create a SOAP request to query MSN Search, and receive the results as an array. Parse out the array and display the results in a reasonable manner — like a definition list (dl,dt,dd). This article presents both a class for performing the request to MSN, and a sample end-user search page. The results page is XHTML strict compliant, and is heavily tagged with CSS classes so you can format it any way you wish without modifying the PHP code.

If you have problems, you may be experiencing one of the following issues:

  1. Character set issues. UTF-8 should be used on all pages. If you’re using something else, consider changing to UTF-8.
  2. NuSOAP uses ISO-8859-1 internally by default. It should be changed to UTF-8.
  3. Encode the XML entities (>, < and &) in anything you present to the user using PHP’s htmlspecialchars function or you’ll have invalid HTML/XHTML.
  4. Extra Credit: You are serving your XHTML as application/xhtml+xml MIME type, aren’t you?

PHP, character encodings and modifying NuSOAP

PHP itself has limited support for different character encodings. A good article on character sets is a place to begin, and it has a list of other resources you might find helpful if you’re new to this issue. For now, we’ll assume you’re using UTF-8 on your site.

If you see strange characters in your results, it may be a result of incorrect character encodings. Character encoding issues are frequently ignored or misunderstood by webmasters but can be a source of problems; in fact NuSOAP uses ISO-8859-1 by default. This is a problem, as you should be using UTF-8 encoding; a few tweaks to the NuSOAP code are required. (Of course, if you only use ASCII characters, this issue won’t present itself).

The nusoapx.php file has been modified (from the 0.7.2 Aug 2005 version you can download) in about 10 places (look at the enclosed nusoap.patch file for the specifics) for two reasons.

  1. The already mentioned UTF-8 issue. NuSOAP 0.7.2 uses ISO-8859-1 internally, which corrupts UTF-8 strings. Since you should use UTF-8 on your web site, the search results will have garbage characters for any non-ASCII characters on your pages. Thus, the NuSOAP file is modified to change from ISO-8859-1 to UTF-8. (Lines 131-132, 3110, 5833, 6433)
  2. A naming conflict exists with PHP5. NuSOAP uses a function soapclient() which also exists in PHP5 (as PHP5 has built-in SOAP support — PHP4 does not). You’ll get nasty errors with duplicate function names (namespaces anyone?), so the function is renamed to soapclientx(), and the file is renamed to nusoapx.php. (Lines 2827, 6407, 6474, 7054)

Other than that it’s the 0.7.2 version of NuSOAP. You don’t have to use the enclosed NuSOAP file, but if you don’t you’ll need to rename soapclientx() back to soapclient() in the MSNSearch.php file in one place. If you’re using PHP5, you can also use the built-in SOAP support, but it hasn’t been tested.

Code discussion — MSNSearch class

The MSNSearch class is designed to query MSN Search, returning the results as an array. You should un-comment the "MAILTO" definition at the top of the file and put in your email address to allow the class to email you if an error occurs during processing.

OK, with all that as introduction, let’s examine the code which does the actual query to MSN. The interesting code is all in the search() function which is all we’ll discuss. The remainder of the class relates to formatting results and is basically clear by reading the code.

function search() {
  $this->errorMessage = "";
  $this->totalPages = 0;
  $this->totalRecords = 0;
  $this->results = array();
  $parameters = array(
    'AppID' => $this->appID,
    'Query' => $this->query,
    'CultureInfo' => 'en-US',
    'SafeSearch' => ($this->safeSearch ? 'Moderate' : 'Off'), # Could be Strict here as well
    'Requests' => array (
      'SourceRequest' => array (
        'Source' => 'Web',
        'Offset' => ($this->page - 1) * $this->recordsPerPage,
        'Count' => $this->recordsPerPage,
        'ResultFields' => 'All'
      )
    )
  );

This just sets up the parameters for the search query. All are documented in the API and most are obvious, so not much needs to be said.

$soapClient = new soapclientx("http://soap.search.msn.com/webservices.asmx");
$retry_count = 0;
$soapResult = false;
while( !$soapResult && $retry_count < 3) {  # Try request 3 times before failing
  $soapResult = $soapClient->call('Search', array ('Request' => $parameters), "http://schemas.microsoft.com/MSNSearch/2005/09/fex" );
  $retry_count++;
}
if ($soapClient->getError()) {
  $this->errorMessage = $soapClient->getError();
  $msg = "Search error:\n" . $this->errorMessage . "\n";
  $msg .= "Query: " . $this->query . "\n";
  $msg .= "Date: " . date('D M d Y h:i:s');
  if (defined("MAILTO"))
    @mail(MAILTO,"** Search ERROR ***", $msg);
  return false;
}

This is the real part of the function. It makes the request to MSN’s servers. It attempts three times before failing, and if it does and the "MAILTO" variable is defined (at the top of the file), you’ll get an email if the request fails.

$this->totalRecords = $soapResult['Responses']['SourceResponse']['Total'];
$this->totalPages = ceil($this->totalRecords / $this->recordsPerPage);
if (($this->totalRecords > 0) && (is_array($soapResult['Responses']['SourceResponse']['Results']['Result']))) {
  foreach ($soapResult['Responses']['SourceResponse']['Results']['Result'] as $item) {
    $this->results[] = array (
      'url' => $item['Url'],
      'displayurl' => $item['DisplayUrl'],
      'cacheurl' => isset($item['CacheUrl']) ? $item['CacheUrl'] : "",
      'title' => isset($item['Title']) && trim($item['Title']) != '' ? $item['Title'] : $item['DisplayUrl'],
      'snippet' => isset($item['Description']) ? $item['Description'] : ""
    );
  }
}

Now parse out the results returned. Most of this is a simple loop with one caveat — MSN can return null or empty strings (if for example, the page is indexed but not cached the CacheUrl will be undefined). Thus, we need to check if the results are defined, and if not, provide a reasonable default.

If all was successful, return true. If it failed, false is returned and the errorMessage class variable will have the SOAP error.

That’s about if for the MSNSearch class.

Code Discussion — Search Template

Using the class involves just a few steps:

  1. Initiate an instance of the MSNSearch Class: $msnsearch = new MSNSearch('INSERTAPIKEYHERE');.
  2. Set the query you wish to search for: $msnsearch->setQuery($q);.
  3. Set the start page: $msnsearch->setPage($start);.
  4. Perform the search: $sresult = $msnsearch->search();.
  5. If search is successful, print the results using functions from the class. MSNSearch has methods making this easy!

Output is tagged with CSS classes to make formatting easy. Here’s an example showing how simple it is:

<?php
print searchform($q,"form_top",SEARCH_URL);
if (strlen($q) > 1) {
  $msnsearch = new MSNSearch('INSERTAPIKEYHERE');
  $sresult = false;
  $msnsearch->setQuery($q);
  $msnsearch->setPage($start);
  $sresult = $msnsearch->search();
  if (($sresult === true) && ($msnsearch->totalRecords > 0)) {
    print $msnsearch->search_header($q);
    print $msnsearch->search_results();
    print $msnsearch->search_navagation($q);
    print searchform($q,"form_bottom",SEARCH_URL);
  } else 
    print "<p>Sorry, no results found for <b>$q</b>.</p>\n";
}
?>

You need to put your API key in where it says INSERTAPIKEYHERE. The remainder shows a sample call, and uses the methods of the MSNSearch class to display results. One warning, you must use PHP’s htmlspecialchars function to insure the results are encoded correctly and won’t invalidate your page (NEVER use htmlentities). The MSNSearch class performs this for you automatically, but if you decide to roll your own methods to display results, be sure you use htmlspecialchars to format your XHTML/HTML correctly. If you’re serving your XHTML as application/xhtml+xml MIME type (as you should) this is a critical step to take.

The only other code change required for a site-search instead of a global web search is to restrict MSN to search only your site. Find the code with the following line:
$msnsearch->setQuery($q);

Change it to something like the following:
$msnsearch->setQuery($q . " site:mysite.com");

Now, you’ll only see results from your site.

Summary

That’s all required to setup a search using MSN’s API. To summarize, perform the following.

Comments? Send feedback for this page here (512 characters).
(If you want a response, please include an email address)

Subscribe to RSS Feed Subscribe to Podcast



The Lord is not slack concerning his promise, as some men count slackness; but is longsuffering toward us, not willing that any should perish, but that all should come to repentance. (2 Peter 3:9 KJV)