The Netscape Web Publisher search function provides you with the ability to search the file information and contents of documents on a remote server. Server documents can be in a variety of formats, such as HTML, Microsoft Word, Adobe PDF, and WordPerfect. The server converts many types of non-HTML documents into HTML as it indexes them so that you can use your web browser to view the documents that are found for your search.
You can search through server documents for a specific word or attribute value, obtaining a set of search results that list all documents that match the query. You can then select a document from the list to browse it in its entirety. This provides easy access to server content.
There are four parts to text searching:
Search home page
There is a search home page, at http://search-ui/examples, that provides individual links to the search query interfaces, samples of search input and output, and a brief tutorial on how your server administrator can customize the interface.
To enable searching capability on your server, the server administrator begins by identifying the documents that you want to be able to search. Before you can execute searches, you need a database of searchable data against which you can target your searches. Your server administrator has to create a document information database, called a collection, that indexes and stores the content and file properties for each of the documents you want to be able to search.
In the case of Web Publisher, there is a default web publishing collection that contains all the documents that you have published, uploaded, or otherwise manipulated through Web Publisher. Your server administrator can also do bulk indexing of web publishing data for you, for example, by indexing all the documents in the document directory defined for Web Publisher.
Collections contain such information as the format of the documents, the language they are in, their searchable attributes, the number of documents in the collection, the collection's status, and a brief description of the collection. For more details, see the section "Displaying collection contents."
Server documents can be in a variety of formats, such as HTML, Microsoft Excel, Adobe PDF, and WordPerfect. If there is a conversion filter available for a particular file format, the server converts the documents into HTML as it indexes them so that you can use your web browser to view the documents that are found for your search.
There are conversion filters for documents in these formats:
Note
If a PDF file is password-protected or contains special graphical navigation icons, the conversion filter cannot index the file.
Certain file formats have a default set of attributes that are indexed for files of that type, as shown in Table 5.1. Note that ASCII files have no default attributes.
By default, HTML collections only have Title and SourceType attributes, but they can be set up to also permit searching and sorting by up to 30 file attributes tagged with the HTML <META> tag.
For example, a document could have these META-tagged attributes:
<META NAME="Writer" CONTENT="J. S. Smith"> <META NAME="PubDate" CONTENT="07-24-97"> <META NAME="Product" CONTENT="Communicator">
If this document had been indexed with its META tags extracted, you could search it for specific values in the writer, publication date, or product fields. For example, you could enter this query: Writer <contains> Smith or PubDate > 1/1/97.
Any attribute values in META-tagged fields are text strings only, which means that dates and numbers are sorted as text, not as dates or numbers. Also, illegal HTML characters in a META-tagged attribute are replaced with a hyphen.
Users are primarily concerned with querying the data in the search collections and getting a list of documents in return. The default installation of the Enterprise server includes a set of search query and result pages to allow users a quick and easy way of doing searches.
There are three default search query pages: standard and advanced HTML forms and a Java-based guided applet.
On the standard search form, you select a collection to search against and type in a word or phrase to search for using the query language operators.
On the advanced HTML form, you have the additional options of selecting multiple collections to search through, establishing a sort sequence for the results, and defining how many documents are to be displayed on a page at a time (clicking the Prev and Next arrows moves you through the pages of results).
In the guided Java-based search applet, the applet uses several drop-down lists to guide you through constructing a query. You must have Java enabled for your browser to use this applet.
To perform a standard search, follow these steps:
The standard search query page
You can choose to use the advanced HTML search form, which helps you construct the query. This form is especially useful if you want to search through more than one collection or that produces results sorted by a specific attribute value.
To access advanced HTML search through the standard search query page, follow these steps:
The advanced HTML search query page
You can choose to use the Java-based guided search interface, which helps you construct the query. This is especially useful if you want to build a query that has several parts, say searching for a word in the documents' content as well as a specific attribute value.
Make sure Java is enabled for your browser. To do this, use the Languages option preferences menu command.
To access guided search from the standard search query page, follow these steps:
The guided search query applet
There are two standard types of search results: a list of all documents that match the search criteria and the text of a single document that you selected from the list of matching documents.
Which documents you get for your search results depend on the access control rules set for each of the documents and collections involved. The server does an access check when you perform these actions:
If the server encounters an access control rule that restricts your access to a document that matches your query, the document is not listed as part of the search results. If you do not have permission to view a document listed in the search results, the server does not display it.
In the default installation of the Netscape Enterprise Server, when you execute a search from either the simple or advanced search query pages, you obtain a list of the documents that match your search criteria. The list gives some standard information about each file, depending on the collection's format. For example, the default results page for email collections give subject, to, from, and date for each entry and news collections give subject, from, and date for each entry.
The kind of file format in the collection indicates which default attributes are available for searching. See "About collection attributes" and Table for information about the attributes for each format.
For entries resulting from a search that checks for comparative proximity of words to each other or for the exactness of the match, the file's ranking can be provided by showing a score.
If there are more matching documents than can fit on a page, click Next to see the next batch. You can always execute a new search by entering new query data and clicking Search.
By default, or if you don't enter anything in the Sort By field on the advanced HTML query page, all documents matching the search are output according to their relevance ranking (for queries that consider this) or their position in the server file database (for other queries).
If you enter an attribute name in the Sort By field, the documents are displayed in an ascending sort sequence. You can list the documents in a descending sort sequence by adding a minus sign (-) prefix to the attribute, as in -keywords or -title. You can do a multiple sort, by typing in more than one field, as in Author,-PubDate.
In a short query, sort order usually isn't critical, but in queries that result in a great many matches, you may want to set a sort value in order to obtain useful search results. Note, however, using a special sort sequence may impact the search's performance.
Attribute values in META-tagged fields are text strings, which means that dates and numbers are sorted as text, not as dates or numbers. To convert the value into a date or number, you can create a new property in the Web Publishing|Add Custom Property form and check the box that marks this property as a META-tagged attribute.
In the default installation of Netscape Enterprise Server, when you obtain a list of the documents that match your search criteria, you can select a single document to display in your web browser. The browser can display the original document or you can choose to display the document with additional formatting so that your search query word or phrase is highlighted with such text attributes as color, boldface, or blinking.
To view the original document, click on the hypertext link containing the document's URL. In the case of documents that have been converted into HTML, the URL points you to the original document. Clicking on this link spawns an external viewer to display the document in its original format.
To view a highlighted document, click on the graphical element next to the document's entry in the search results.
You can display the contents of your collection database to see which attributes are set for each collection. Your server administrator may have defined some collections as non-displayable, in which case they are not inclued in the output. The collection contents typically include these items:
To display your collection database contents, type this line in the web browser's URL location field (be sure not to include any spaces):
http://yourServer/search?NS-search-page=c
To perform an effective search, you need to know how to use the query operators. You can only do Boolean searches, so all the subsequent information is based on Boolean search rules.
Note
The query language is not case-sensitive. The examples use uppercase for clarity only.
The search engine interprets the search query based on a set of syntax rules. For example, by entering the word region, the actual word region and all its stemmed variations (such as regions and regional) are found. The search results are ranked for "importance," which means how close the matched word comes to the originally input search criteria. In the example above, region would rank higher than any of the stemmed variants.
Not all queries rank their results. For example, queries that check whether a given string matches the value in a field cannot perform a comparison: either the string matches the value or it doesn't. The same is true for checking whether a string is contained in a field, or begins or ends a field.
The search query language has some implicit defaults and assumptions that dictate how it interprets your input. In some cases, you can circumvent the defaults, but here is how the search engine decides what you want as the search results:
<STEM>--Search finds all documents that contain any stemmed variant of the search word or phrase. The search engine looks at the meaning of the word, not just its spelling. For example, if you want to search on plan, the results would include documents that contain planning and plans, but not those that contain plane or planet.
<MANY>--Search considers how often the search word or phrase appear in the found documents and ranks the results for frequency (or relevancy).
<PHRASE>--Search considers words separated by spaces to be part of a phrase. For example, Monterey otter is interpreted as a phrase and both must be present and together to be found. Such a search would not find documents containing sea otter or Monterey Bay.
Note
In any case where it's not clear that two words are to be considered as a phrase, you can use parentheses for clarity. For example,
<PHRASE> (rise "and" fall).
OR--Search considers each word or phrase in the query separated by a comma to be optional, although at least one must be present. In effect, this is an implicit OR operation. For example, Monterey, otter is interpreted as searching for documents that contain either Monterey or otter. Note that angle brackets are not required for OR.
To create complex searches, you can combine query operators, manipulate the query syntax, and include wildcard characters.
With the exception of the AND, OR, NOT, and the date and numeric comparison operators, you need to enclose query operators in angle brackets, as in <CONTAINS> and <WILDCARD>.
You can combine several query operators into a single query to obtain precise results. For example, you can input the following query to limit your search to those documents that have Bay and Monterey but to exclude those that mention Aquarium
Monterey AND Bay NOT <CONTAINS> Aquarium
You can achieve even greater precision by including some implicit phrases, as in the following query that finds documents that refer to the Monterey Bay Aquarium by its full name and also mention otters but do not refer to shark:
Monterey Bay Aquarium AND otter AND NOT shark
You can use any of the query operators as a search word, but you must enclose the word in quotation marks. For example, you could search for documents about the ebb and flow of the tides with the following query:
You can cancel the implicit stemming by using quotation marks around a word. For example, you can be exact by using a query such as this:
This search only results in documents that contain the exact word plan. It ignores documents with plans or planning.
You can use AND, OR, and NOT to modify other operators. For example, you may want to exclude documents with titles that contain the phrase theme park. A query such as this would solve this problem:
Title NOT <CONTAINS> theme park
Use the following reference to help determine which operators to use. Note that the query language is not case-sensitive, so <starts> and <STARTS> are equivalent. This document uses uppercase for clarity only.
The following table describes some commonly used operators and provides examples of how to use each one. All are relevance ranked except where explicitly noted.
You can use wildcards to obtain special results. For example, you can find documents that contain words that have similar spellings but are not stemmed variants. For example, plan stems into plans and planning but not plane or planet. With wildcards, you can find all of these words.
Some characters, such as * and ?, automatically indicate a wildcard-based search and do not require you to use the <WILDCARD>operator as part of the expression.
Sometimes you may want to search on characters that are normally used as wildcards, such as the *or? expression. To use a wildcard as a literal, you must precede it with a backslash. In the case of asterisks, you must use two backslashes. For example, to search on a magazine with a title of Zine***, you would type:
<WILDCARD>Zine***
Several characters have special meaning for the search engine and require you to use back quotes to be interpreted as literals. The special search characters are listed here:
For example, to search for the string "a{b", you would type
<WILDCARD>`a{b`
For another example, if you wanted to search on the string "c`t", which contains a back quote, you would type
<WILDCARD>`c``t`