Skip to navigation

Search

Development: http://pole.uwaterloo.ca/cpadev/web/search/
Production:
http://search.uwaterloo.ca/
Subversion: http://lauren.uwaterloo.ca/repos/cpadev/search/
This is the program that runs the searches that appear throughout the university's website. It searches for people through UWDir, keywords through the cpadev keywords database, and for web pages via Google/Yahoo! API. The application uses an explicit Model/View/Controller architecture, separated in different directories and linked together within Search.class.php.

SearchEngine
This is a series of classes that abstract out the code for performing the search itself. SearchEngine.class.php is an abstract class, which is subclassed in the files in /SearchEngine, for each of the four search engines (UWDir, Keywords, Google, Yahoo).

Google AJAX Search:

HTTP GET request -> JSON response
Google SOAP Search: SOAP request -> SOAP response
Yahoo! Search:
HTTP GET request -> XML response
UWDir Search:
LDAP request -> LDAP response
Keyword Search: SQL query on cpadev keywords database -> standard PHP mysql response
The SearchEngine classes are also responsible for parsing the response and placing each search result item into its respective SearchResult object, and collecting them and returning a SearchResultset which is used to display the result. Caching is also implemented in some SearchEngines.
SearchRequestParser
This group of classes parse the uniform search-request format that the user enters into the different formats required by each search engine. Currently the parser only performs appropriate character escaping; for example running through mysql_real_escape_string for keywords (since it is a database query) and escaping for the LDAP filter that gets passed to UWDir.
SearchResult
Defines objects which act as containers for results that are parsed by SearchEngine classes.
SearchAuditor
Keeps track of search terms and their frequency.
SearchSuggestion
Returns an array of query suggestions.
/config
Defines some global constants
/cache
Stores cache results from Google/Yahoo searches
/search_results.php
The main file for the View part of the application.

Libraries

NuSOAP (/lib/nusoap), a SOAP client for communicating with the Google API. /lib/xml-simple.php is no longer in use. PHP's built-in SimpleXML classes are used instead for parsing Yahoo search results.

Original documentation

Created when the search was developed:
http://web.uwaterloo.ca/documentation/search/index.php
http://web.uwaterloo.ca/documentation/search/detaileddocumentation.php

Updated Search (Winter, 2009)

The search engine now uses the Google AJAX Search API as its primary search engine. The depreciated Google SOAP and Yahoo API now serve as backups in case the AJAX API doesn’t bring in any results. Google custom search can be implemented by including the Custom Search API key into the AJAX query string under this format: cx=CSE_KEY.
A database based search statistics page can be found here.
A few notes to anyone editing the AJAX search in the future:

  • Each page can only contain the number of results each JSON file brings in, or the multiple of it. If not, the same JSON results can be read over multiple of times, making it inefficient. If the number changes in the future, be sure to update the numbers in the configure files accordingly.
  • Each query only returns a max of 64 results, that is 8 pages x 8 results per page.
  • Do not depend on the results returned by the estimatedTotalResults function in the Google API. The result varies and could only be used as a reference.
  • The mockup of dynamic loading search can be found under search_results_dynamicloading.php in the templates/dynamic folder. Replace the contents in search_results.php with search_results_dynamicloading.php to see the mockup. Note that only the web portion of search results is implemented.

Updated algorithm for LDAP search.

  • The updated regular expression only searches for any results that matches the whole word, it will no longer search for results that contain the query as a substring. ex: If you search for “abc”, it will return results that contain “abc”, but will not contain results such as “dfabc”.
  • Now supports +, - and *.
    • Adding a + in front of the search string will make the parser search it as an independent string
    • Adding a - in front of a search string to disregard any results containing that search string.
    • Adding a * infront of a keyword will make the parser search for any results that have the query as a substring. ex: if you search for “abc”, results will include “dfabc”.

Search suggestions

The class creates suggestions from two tables: search_terms and search_keywords.

Keyword suggestions (synonyms)

The function will first look into the keyword table. If it finds anything with the keyword it will suggest that to the user. Convenient if the user is searching "ERS" for Environment and Resource Studies.
Synonym terms are currently managed in CPAOnline.

Spelling suggestions

If nothing is found, it will look into the title of the keywords and try to find any similar sounding phrases. If the user searches "daly bulletin", it will recommend "Daily Bulletin" from keywords. It will remove any suggestions that have less than 75% similarity to the query. This is to prevent suggestions if the user is searching for some really short phrases.

As a last resort, the ftn will then look into the most popular search term that has a similar sound to the query. It will only look at search terms with 100+ hits, so it won't give some random suggestions.

The spell checking part won't work well if the user misspells the first or last letter of their query. But that won't be much of a problem because people recognize words with its first and last letter. If those letters go missing, it is extremely recognizable.

Feedback form

The feedback form allows users to tell us when the search didn't bring up the results they were looking for. This will email to the admin user defined in the config file, and submit to the search_feedback database.
In CPAonline there is a space to add notes to this feedback -  helpful to make a note of why certain problems are happening and a starting point on how we can fix them. The hidden option allows you to hide anything that's not helpful to improving search (e.g. spelling mistakes, stuff we just don't have).

Dynamic Search Suggestion (currently not implemented)

An unobtrusive javascript for the dynamic search suggestion function is avaliable with these files under pole.uwaterloo.ca/cpadev/web/search:

  • Templates/forms/search_suggestion.js
  • Templates/forms/generateSearchSuggestions.php

This will autofill suggestions as the user types in the serarch form.
 
To implement this function, all you need to do is link the page to the js script. The js file looks for a textbox with the id "searchterm". The file then automatically attaches a box with suggestion terms. The terms are taken from the database on strobe.
The php file should be repositioned to a static URL so it is available to other servers as well (Be sure to change the links in the js file if you do that).
The search terms database is on the strobe server. Given the size of the table, it will be too slow to be of any use outside of strobe. We should look into more efficient ways to query a non-local database so the suggestion pops up instantaneously.
The script is missing functions to navigate the results using arrow keys and a method that chooses the suggestion under the box when the user presses enter. It is also missing some error catching code. ex: try entering multiple spaces.

Results differing from Google.com

Sometimes results are different from what is returned on google itself. Try, for example, "biochemistry" in our search vs. on google. There is a bug filed about this issue.
Normally there isn't a huge amount of variation (e.g. first two results flipped; often everything is the same until after the 4th or 5th position).

Updating Search

Search is managed through subversion, so all you have to do to update is run svn update.

UW Website Redesign on Facebook