Site Search using MSN API and PHP
September 2006
If you’re interested in having a site search on your web site, you can create one yourself, use one of the free services, or better yet use one of the existing search engines. The big three (Google, MSN, Yahoo) each released API’s so you can query them, and then present the results any way you wish. This article presents a PHP class which queries MSN Search, and returns the result as an array.
Quick Start
Warning!
You should *never* try out code on your production server. Always have a test or development box for trying out new code. Experimenting on your live server is asking for trouble!
If you’re in a hurry (which you shouldn’t be), just grab the file with the code. You’ll need a MSN API Key before continuing. After getting your key, open up /search-msn.php and look for the line that reads
$msnsearch = new MSNSearch('INSERTAPIKEYHERE');.
Put your API key in, save the file, and place the files in the following locations on your test web server. The additional files in the archive aren’t needed — they’ll be explained later.
/search-msn.php /include/MSNSearch.php /include/nusoapx.php
That’s it! Navigate to /search-msn.php and you should get a functioning web search. Of course, you should (must?) modify some of the setup for your own site, so if you didn’t get the results you expect, keep reading.
SOAP and possible problems
SOAP is “a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment (w3c.org SOAP docs)”. In short, it’s a way for you to call a service on a remote machine, and get the results back in an easy to use format. PHP doesn’t support SOAP (at least until PHP5), but a library is freely available called NuSOAP, available at http://sourceforge.net/projects/nusoap/.
The procedure is simple, if not a little cryptic. Create a SOAP request to query MSN Search, and receive the results as an array. Parse out the array and display the results in a reasonable manner — like a definition list (dl,dt,dd). This article presents both a class for performing the request to MSN, and a sample end-user search page. The results page is XHTML strict compliant, and is heavily tagged with CSS classes so you can format it any way you wish without modifying the PHP code.
If you have problems, you may be experiencing one of the following issues:
- Character set issues. UTF-8 should be used on all pages. If you’re using something else, consider changing to UTF-8.
- NuSOAP uses ISO-8859-1 internally by default. It should be changed to UTF-8.
- Encode the XML entities (>, < and &) in anything you present to the user using PHP’s htmlspecialchars function or you’ll have invalid HTML/XHTML.
- Extra Credit: You are serving your XHTML as application/xhtml+xml MIME type, aren’t you?
PHP, character encodings and modifying NuSOAP
PHP itself has limited support for different character encodings. A good article on character sets is a place to begin, and it has a list of other resources you might find helpful if you’re new to this issue. For now, we’ll assume you’re using UTF-8 on your site.
If you see strange characters in your results, it may be a result of incorrect character encodings. Character encoding issues are frequently ignored or misunderstood by webmasters but can be a source of problems; in fact NuSOAP uses ISO-8859-1 by default. This is a problem, as you should be using UTF-8 encoding; a few tweaks to the NuSOAP code are required. (Of course, if you only use ASCII characters, this issue won’t present itself).
The nusoapx.php file has been modified (from the 0.7.2 Aug 2005 version you can download) in about 10 places (look at the enclosed nusoap.patch file for the specifics) for two reasons.
- The already mentioned UTF-8 issue. NuSOAP 0.7.2 uses ISO-8859-1 internally, which corrupts UTF-8 strings. Since you should use UTF-8 on your web site, the search results will have garbage characters for any non-ASCII characters on your pages. Thus, the NuSOAP file is modified to change from ISO-8859-1 to UTF-8. (Lines 131-132, 3110, 5833, 6433)
- A naming conflict exists with PHP5. NuSOAP uses a function soapclient() which also exists in PHP5 (as PHP5 has built-in SOAP support — PHP4 does not). You’ll get nasty errors with duplicate function names (namespaces anyone?), so the function is renamed to soapclientx(), and the file is renamed to nusoapx.php. (Lines 2827, 6407, 6474, 7054)
Other than that it’s the 0.7.2 version of NuSOAP. You don’t have to use the enclosed NuSOAP file, but if you don’t you’ll need to rename soapclientx() back to soapclient() in the MSNSearch.php file in one place. If you’re using PHP5, you can also use the built-in SOAP support, but it hasn’t been tested.
Code discussion — MSNSearch class
The MSNSearch class is designed to query MSN Search, returning the results as an array. You should un-comment the "MAILTO" definition at the top of the file and put in your email address to allow the class to email you if an error occurs during processing.
OK, with all that as introduction, let’s examine the code which does the actual query to MSN. The interesting code is all in the search() function which is all we’ll discuss. The remainder of the class relates to formatting results and is basically clear by reading the code.
function search() {
$this->errorMessage = "";
$this->totalPages = 0;
$this->totalRecords = 0;
$this->results = array();
$parameters = array(
'AppID' => $this->appID,
'Query' => $this->query,
'CultureInfo' => 'en-US',
'SafeSearch' => ($this->safeSearch ? 'Moderate' : 'Off'), # Could be Strict here as well
'Requests' => array (
'SourceRequest' => array (
'Source' => 'Web',
'Offset' => ($this->page - 1) * $this->recordsPerPage,
'Count' => $this->recordsPerPage,
'ResultFields' => 'All'
)
)
);
This just sets up the parameters for the search query. All are documented in the API and most are obvious, so not much needs to be said.
$soapClient = new soapclientx("http://soap.search.msn.com/webservices.asmx");
$retry_count = 0;
$soapResult = false;
while( !$soapResult && $retry_count < 3) { # Try request 3 times before failing
$soapResult = $soapClient->call('Search', array ('Request' => $parameters), "http://schemas.microsoft.com/MSNSearch/2005/09/fex" );
$retry_count++;
}
if ($soapClient->getError()) {
$this->errorMessage = $soapClient->getError();
$msg = "Search error:\n" . $this->errorMessage . "\n";
$msg .= "Query: " . $this->query . "\n";
$msg .= "Date: " . date('D M d Y h:i:s');
if (defined("MAILTO"))
@mail(MAILTO,"** Search ERROR ***", $msg);
return false;
}
This is the real part of the function. It makes the request to MSN’s servers. It attempts three times before failing, and if it does and the "MAILTO" variable is defined (at the top of the file), you’ll get an email if the request fails.
$this->totalRecords = $soapResult['Responses']['SourceResponse']['Total'];
$this->totalPages = ceil($this->totalRecords / $this->recordsPerPage);
if (($this->totalRecords > 0) && (is_array($soapResult['Responses']['SourceResponse']['Results']['Result']))) {
foreach ($soapResult['Responses']['SourceResponse']['Results']['Result'] as $item) {
$this->results[] = array (
'url' => $item['Url'],
'displayurl' => $item['DisplayUrl'],
'cacheurl' => isset($item['CacheUrl']) ? $item['CacheUrl'] : "",
'title' => isset($item['Title']) && trim($item['Title']) != '' ? $item['Title'] : $item['DisplayUrl'],
'snippet' => isset($item['Description']) ? $item['Description'] : ""
);
}
}
Now parse out the results returned. Most of this is a simple loop with one caveat — MSN can return null or empty strings (if for example, the page is indexed but not cached the CacheUrl will be undefined). Thus, we need to check if the results are defined, and if not, provide a reasonable default.
If all was successful, return true. If it failed, false is returned and the errorMessage class variable will have the SOAP error.
That’s about if for the MSNSearch class.
Code Discussion — Search Template
Using the class involves just a few steps:
- Initiate an instance of the MSNSearch Class: $msnsearch = new MSNSearch('INSERTAPIKEYHERE');.
- Set the query you wish to search for: $msnsearch->setQuery($q);.
- Set the start page: $msnsearch->setPage($start);.
- Perform the search: $sresult = $msnsearch->search();.
- If search is successful, print the results using functions from the class. MSNSearch has methods making this easy!
- search_header() — Displays a header with number of results and pages.
- search_results() — Displays all the results formatted as a definition list.
- search_navagation() — Displays navagation for next and previous pages, if any. Links are automatically created.
- searchform($q,"form_top",SEARCH_URL) — Use to create a search form anywhere on your page. $q is the query and will be displayed in the input box. "form_top" is any string and is used to tag the resultant box with a CSS ID. SEARCH_URL is the link to your search page (SEARCH_URL has been predefined to /search-msn.php. If you need to change it, it’s at the top of the search-msn.php file).
Output is tagged with CSS classes to make formatting easy. Here’s an example showing how simple it is:
<?php
print searchform($q,"form_top",SEARCH_URL);
if (strlen($q) > 1) {
$msnsearch = new MSNSearch('INSERTAPIKEYHERE');
$sresult = false;
$msnsearch->setQuery($q);
$msnsearch->setPage($start);
$sresult = $msnsearch->search();
if (($sresult === true) && ($msnsearch->totalRecords > 0)) {
print $msnsearch->search_header($q);
print $msnsearch->search_results();
print $msnsearch->search_navagation($q);
print searchform($q,"form_bottom",SEARCH_URL);
} else
print "<p>Sorry, no results found for <b>$q</b>.</p>\n";
}
?>
You need to put your API key in where it says INSERTAPIKEYHERE. The remainder shows a sample call, and uses the methods of the MSNSearch class to display results. One warning, you must use PHP’s htmlspecialchars function to insure the results are encoded correctly and won’t invalidate your page (NEVER use htmlentities). The MSNSearch class performs this for you automatically, but if you decide to roll your own methods to display results, be sure you use htmlspecialchars to format your XHTML/HTML correctly. If you’re serving your XHTML as application/xhtml+xml MIME type (as you should) this is a critical step to take.
The only other code change required for a site-search instead of a global web search is to restrict MSN to search only your site. Find the code with the following line:
$msnsearch->setQuery($q);
Change it to something like the following:
$msnsearch->setQuery($q . " site:mysite.com");
Now, you’ll only see results from your site.
Summary
That’s all required to setup a search using MSN’s API. To summarize, perform the following.
- Edit the top of the MSNSearch.php to include your email address so the script can report search failures.
- Edit the search-msn.php file to include your MSN API key.
- Edit your CSS for the custom classes to present your results as you want.
- Edit the search-msn.php file to restrict the results to your site.
- For advanced uses, you can move or rename the files as you want, but you’ll have to modify the code a bit to include the proper files/paths.
Copyright © 1999-2008 Darrin Yeager. http://www.dyeager.org
This page is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. In summary, you are free to share (copy and distribute) the work under the following conditions (see the actual license for more information):
- Attribution. You must attribute the work to the author (but not in any way that suggests that they endorse you or your use of the work). Attribution should refer back to this web page and include a copyright notice and the license terms.
- Noncommercial. You may not use this work for commercial purposes.
- No Derivative Works. You may not alter, transform, or build upon this work.