RETRIEVAL ASSISTANCE MEDIATED BY
CONCEPTUAL VIEWS AND MULTIPLEX SEARCHING

Richard S. Marcus

Laboratory for Information and Decision Systems (LIDS)

Massachusetts Institute of Technology

Cambridge, MA 02139-4307

CONTACT INFORMATION

Mail: Richard S. Marcus, MIT LIDS, Room 35-414, Cambridge, MA 02139

Email: MARCUS@LIDS.MIT.EDU
Phone: 617/253-2340 (voice mail)
FAX: 617/258-8553

WWW PAGE

http://web.mit.edu/rmarcus/www/

PROGRAM AREA

Adaptive Human Interfaces.

KEYWORDS

Information retrieval; intelligent search agents; expert intermediary systems for text-based retrieval; intelligent document search assistance; contextual, structural, interactive Boolean search models; virtual common database system.

PROJECT SUMMARY

This project studies advanced search assistance techniques that are based on the contextual and structural features inherent in modern interactive Boolean-based retrieval systems and models. Such techniques enable, for example, (1) intelligent database selection and search strategy formulation and (2) dynamic search evaluation (including recall estimation), document relevance ranking, and strategy optimization -- all based on minimal, but detailed, user relevance feedback interacting with a formalized conceptual Boolean topic representation (BTR) derived from system-user dialog. The previously described development of, and experimentation with, simpler versions of the CONIT intermediary retrieval assistant, which was built on a virtual system base, has already demonstrated that novice users can achieve retrieval effectiveness from bibliographic databases as good as that achieved by expert human search intermediaries for the same topics and databases.

The current project attempts to investigate how an expert version of the CONIT system could realize the potential of these advanced techniques to achieve a new, highe level of retrieval system effectiveness through development of an intelligent search assistant that permits easy manipulation of the system functionality by the incorporation of a graphical user interface (GUI) within an object-oriented software structure. Progress on this project has been made in four main areas: (1) additional experimentation to analyze the effectiveness of the advanced techniques; (2) the implementation of the framework for an expert GUI CONIT buuilt in an object oriented software environment; (3) investigations of novel visualization techniques for representing Boolean and vector search formulations; and (4) reporting on research progress, including analysis indicating how the techniques could be generally applicable to any text-based databases.

PROJECT REFERENCES

Marcus, R.S. "An Experimental Comparison of the Effectiveness of Computers and Humans as Search Intermediaries." Journal of the American Society for Information Science. 34(6):381-404. November, 1983.

Marcus, R.S. "Expert Retrieval Assistance Development and Experimentation." Proceedings of the 51st ASIS Annual Meeting. 25:115-119; October, 1988.

Marcus, R.S. "Computer and Human Understanding in Intelligent Retrieval Assistance." Proceedings of the 54st ASIS Annual Meeting. 28:49-59; October, 1991

Marcus, R.S. "Intelligent Assistance for Document Retrieval Based on Contextual, Structural, Interactive Boolean Models." Proceedings of the RIAO 94 Conference. Volume 2, pp 27-43; New York, NY. October 11-13, 1994.

Marcus, R.S. "The RIAO94 Conference and the Status of Information Retrieval: A Personal View." ACM SIGIR Forum 28(2):7-16, Fall, 1994.

Spoerri, Anselm. "InfoCrystal: Integrating Exact and Partial Matching Approaches Through Visualization." Proceedings of the RIAO 94 Conference. pp 687-696. New York, NY. October 11-13, 1994.

Tummala, Dinesh R. "Natural Language Issues in Information Retrieval Systems." Master of Science Thesis in Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA. February, 1993.

AREA BACKGROUND

I would describe my discipline as document retrieval where one considers the problem of finding documents relevant to a topic. As opposed to data-specific retrieval (e.g., "how many hours of vacation time did John Smith in the accounting department take so far this year?"), document-by-topic retrieval generally has no clearly defined answer set: document relevancy is a matter of degree and will vary depending on who is assigning relevance. A central reason for this uncertainty is the ambiguity and imprecision of natural language. The searcher generally makes a natural language statement describing the topic of interest. Documents are written in natural language by authors. Documents may be indexed by a combination of elements from the document text, some controlled vocabulary, and the indexers own words. With a multitude of language users and usages, it is clear that any matching of a searcher's interests with document contents is a difficult matter to predict or assess.

For circumventing these problems in existing, mainly Boolean- based, document retrieval systems three paradigms for research and development of enhanced systems have arisen: (1) natural language with deep semantic analysis; (2) statistical (including probabilistic and neural net variations); (3) thesaural; and (4) contextual, structured interactive Boolean (CSIB, also called "smart Boolean"). Because computerizing deep semantics (1) has proven quite difficult, a number of researchers have gone to a statistical approach which emphasizes the much simpler scheme of capturing interesting relations among texts and questions by analyzing the statistics (2) of word use in the documents and the search statements. Other researchers have sought to perfect controlled indexing vocabularies and their use by devising enhanced thesaural indexing schemes (3) which try to better (re)capture the topical content of the texts. A fourth school, in which we may be included, attempts to capture and extend the intelligence of human expert search intermediaries within the context of the knowledge- based expertise of these humans and the refined functionality of advanced modern Boolean-based search systems, including field, approximate match, and proximity searching. It is our hypothesis that this smart Boolean approach, which includes elements of the other three, is the best approach, at least in the near and intermediate term, for improving the level of effectiveness and efficiency of document retrieval systems and making searching into a cooperative, rational, decision-making process within a human-machine system (in contrast to the seat- of-the-pants, informal process that exists today).

AREA REFERENCES

Salton, Gerard; McGill, M.J. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY. 1983. [General review of the field with emphasis on statistical techniques.]

Proceedings of the RIAO 94 Conference. New York, NY. October 11-13, 1994. [Good spectrum of current research efforts.]

Harman, Donna K. (editor) The Second Text REtrieval Conference (TREC-2). National Institute of Standards and Technology Report NIST Special Publication 500-215. March, 1994. [Best experimental program for testing existing systems -- emphasizing statistical approaches.]

BYTE, 17(6); June, 1992. [This issue contains comparison of search and retrieval systems.]

Annual Review of Information Science and Technology [Especially recent years for overall coverage of various aspects of the field.]

[See references for our project above for smart Boolean paradigm emphasis.]

RELATED PROGRAM AREAS

1. Virtual Environments.

2. Speech and Natural Language Understanding.

5. Usability and User-Centered Design.

POTENTIAL RELATED PROJECTS

Researchers interested in the several ISP areas listed above could use the CONIT intermediary system and the document retrieval application as a vehicle for studying these areas.