SearchToHTML ReadMe

September 7, 2004
Introduction
Adding SearchToHTML to your site
Examples
How it works
Changing the style of the search results
Requirements
Multibyte encodings support
Source
Contact
Alternatives
License

Other documentation

Parameters guide
Releases

Introduction

Top

The SearchToHTML applet provides your visitors with a simple way to search your Web site. Search results are displayed in HTML. Using HTML provides your visitors with a familiar interface and provides you with greater control over the display of the search results. Other features of the applet include picking out the closest anchor to a match and displaying the context surrounding a match.

The summer 2004 release of SearchToHTML represents a major advancement over previous releases. The search form has been converted to pure HTML, making the interface much lighter weight and even more customizable. Search results are now split over multiple pages, as traditional search engines do, removing any practical limit on the number of results that may be displayed. Multibyte text encodings are supported. Multiple matches may now be returned for a single page.

Unfortunately, the summer 2004 release is incompatible with earlier releases. In particular, you will have to merge your previous applet tags with the code in appletframe.html. Also, several interface elements have moved from the applet to the search form, meaning the corresponding parameters have no effect.

SearchToHTML was selected as JavaBoutique's Applet of the Week for the week of May 25, 2001.

You may use SearchToHTML for free for non-commercial and evaluation purposes. A commercial release is forthcoming. In the mean time, contact dfaden@gmail.com to make other licensing arrangements.

Adding SearchToHTML to your site

Top

To add SearchToHTML to your site, do the following:

1. Edit appletframe.html to fit your needs. The parameters understood by applet are described in docs/param.html.

You must fill out the files or files_file parameter to let the applet know which files to search. You may also choose to set both parameters.

The files parameter's value should consist of a list of the files to be searched, each file name separated from the others by a comma, space, tab, or line ending. The file names are expected to be relative to the file displaying the applet, most likely appletframe.html.

For example, if the applet is displayed at http://www.geocities.com/gilbertnews/appletframe.html and you want the applet to search the file http://www.geocities.com/gilbertnews/index.html, you would add index.html to the list of files to search.

2. Upload the required files to your server. The required files are:
  1. searchframes.html
  2. blank.html
  3. search.jar
  4. appletframe.html
  5. results.js
  6. displaymatches.html

In a normal distribution of SearchToHTML, these files are contained in the "required" subdirectory.

searchframes.html serves as a container for the applet and search results. blank.html simply serves as filler. search.jar contains the applet's code. appletframe.html displays the applet. results.js contains the script used to display the search results. displaymatches.html serves as the default target for displaying those results.

2. Paste the following HTML code into the pages where you want the search form to appear. Note that this code does not display the applet. It is simply an HTML form. Submitting the form will in turn load the applet.

Make sure that the action of the form points to the actual location of searchframes.html.

The only required form element is the input field named "q".

Examples

Top

The examples directory contains a couple of examples of how to use the applet.

The example starting at searchform.html shows a basic search form. appletframe.html demonstrates many of the applet's parameters.

The example starting at searchform_innerhtml.html demonstrates one way of writing the applet tag dynamically in response to the user submitting the form. Unfortunately, this example is not very robust or compatible across browsers.

How it works

Top

A search's life cycle has three main parts, typically as follows: 1. A search begins with the user submitting a search form. The user's search is encoded in the URL of a frameset containing the applet. 2. After the applet loads, it decodes the user's search and carries it out. Once the search is finished, the applet displays the first page of search results and changes its interface to allow the user to move between the results pages. To display a page of results, the applet encodes the data for that page of results in a URL. 3. The script corresponding to this URL decodes the data and displays it.

You may customize the search in any of the three parts. 1. You may trim the HTML code of the search form down to just the search box (q). Or you might add some extra elements to allow more of the applet's behavior to be controlled by the user. The applet checks the URL submitted from the search form for its parameter values. Thus, you may set any parameter of the applet by adding the appropriate field to the search form. For instance, you could dynamically specify which files to search by adding a "files" field to the search form. 2. You may customize the working of the applet by changing its parameters. See the Parameters guide for details. 3. Some of the changes you can make to the results display script are described in section below.

The applet automatically displays the first page of results when it is loaded. If the applet is always loaded when the user pages back to it, this can make it impossible to page back before the applet. To solve this problem, we set a cookie when the applet is displayed. If this cookie is set, the applet is not loaded. To allow a new search to begin, we unset the cookie when the user submits the search form.

Changing the style of the search results

Top

The applet encodes the search results and sends them onto a JavaScript found in results.js. If you would like to change the display of the search results, you'll need to modify results.js.

If you're interested in changing the text of the search results, look under the heading "Text used in search results messages:" in the source of results.js.

If you would like to leave out the matches' context, titles, document info, or anchors from the results page, set the send_context, send_titles, send_info or send_anchors parameter respectively to "false". You may also control what is displayed by changing the values of following variables in results.js: displayAnchor, displayDocInfo, or displayContext. However, you should always prefer changing the applet's parameters over the JavaScript for production code. Otherwise, much information will be transmitted needlessly.

The formatting of modification dates and file sizes is also now left to results.js. If the default date and file size displays don't fit your tastes or locale, you may edit the formatFileSize or formatDate functions to get what you want.

If you would like only the titles of search matches to be shown, rather than both the titles and the file names, set the var preferTitleOnly of results.js to true. If the title is unavailable, the file name will be shown instead, in any case.

If you would like only a link to the closest anchor to be listed, rather than both a link to the document and the closest anchor, set the var anchorTheLink of results.js to true. If this is done, the closest anchors will not be listed separately.

The search results are displayed as a list of links in the HTML generated by results.js. Set the var targetFrame to the frame you'd like a search result link to be opened in.

Requirements

Top

SearchToHTML requires that your users' browsers support JDK 1.1 or above. SearchToHTML also requires that your users have JavaScript enabled. JavaScript is used in displaying the search results and in setting up the applet tag.

SearchToHTML is best suited to small to mid-sized sites. Each document searched must be downloaded from the server or local file system. For large sites this could take an unacceptably long time over a slow connection. If you do have a large site, you are probably better off using a server-side solution. A few server-side search engines are listed in the Alternatives to SearchToHTML section.

The tests require JDK 1.2 to build, unlike the rest of the applet's code. They are not part of the default build.

Multibyte encodings support

Top

SearchToHTML now supports multibyte encodings using Java's builtin support. If the server does not provide the document encoding, SearchToHTML will look for the encoding in a meta http-equiv in the top of the document. For example, If SearchToHTML finds <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> near the top of a document, it will take that document to be using GB2312 and reinterpret its bytes under that assumption.

If an encoding you need does not seem to be supported by the applet, please send an e-mail to dfaden@gmail.com with a report of the problem. Please include a sample document along with information about your browser and operating system.

Source

Top

SearchToHTML's source code is included with this distribution. Send your questions or comments on the source to dfaden@gmail.com.

The source as well as the applet is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike License. A commercial release of the applet is forthcoming.

An Ant build.xml file is included. You will need to modify build.xml in order to use it one your file system. In particular, you will need to provide the location of your JDK 1.1 classes library (and JUnit library if you choose to build the tests).

Contact

Top

David Faden <dfaden@gmail.com>

If you experience any problems with the applet or have suggestions, please e-mail dfaden@gmail.com. Want a new feature? Think the documentation is lacking? Send in an e-mail. Please remember, however, that the SearchToHTML applet is distributed without warranty or guarantee. Use it at your own risk.

Alternatives to SearchToHTML

Top

There are now many client-side Java search engines in addition to SearchToHTML. Rather than trying to maintain my own list, below I have included some links to lists of links to Java search engines.

If your site has many pages, you may be better off running a server-side search engine yourself, or use the services of a third party search provider. I have not personally used any of the products or services listed below.

License

Top

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

For other licensing arrangements, contact dfaden@gmail.com

SearchToHTML was originally released as linkware for The Gilbert Post. The Gilbert Post was an independent student run paper for Gilbert High School of Gilbert, Iowa. However, no one picked up work on the paper after I graduated so it now seems futile to continue advertising for it.

Reverse Fad Productions