Task-Based Information Filtering:
Providing Information that is Right for the Job

Paul De Bra, Geert-Jan Houben, Frank Dignum
Department of Computing Science
Eindhoven University of Technology
{debra,houben,dignum}@win.tue.nl

Abstract: Many attempts have been made to provide Internet and Intranet users with tools that aid them in finding valuable information in the many gigabytes of data they have access to. Many company Web-servers are beginning to offer search engines right on the first page, to guide visitors to the information they are looking for. And although large search engines like Alta Vista and Excite sometimes find the appropriate documents, based on just a few well-chosen keywords, most of their answers are not relevant for the user.
The core of the problem with these search engines is their "one size fits all" approach to information retrieval. We propose a different strategy: by using an agent architecture that distinguishes three types of agents (process agents, document warehouse agents and retrieval agents) we can take into account the role of the user in her organization, or the task for which she needs the information. In order to evaluate which documents are relevant for which tasks we propose that cooperative retrieval agents learn to select appropriate documents based on user-feedback.

1. Introduction

Every World Wide Web user has experienced the problem of finding relevant information. Neither subject-based menu systems like Yahoo, nor large search engines like Alta Vista and Excite provide a way to quickly find the documents a user is looking for. Even when one locates a valuable site it is often still difficult to find the appropriate documents on that site. Many Web-servers try to overcome this problem by providing their own miniature version of the large search engines. The information overload on a single site is less dramatic of course, but finding the right documents can still be a problem even on a single site.

The core of this problem is that all available search tools select documents based on the textual content of the document, and not on the purpose or task the document is written for. When one connects to a typical Web-server, information is usually presented based on the hierarchical structure of the company or organization. For most visitors this structure is irrelevant. A presentation based on who the users are or what the purpose of their visit is would greatly help most users. But there is still a danger that none of the offered choices matches the reason why a user contacts the site.

We lack a good mechanism to manage an organization's information in such a way that users have easy and efficient access to the information that is relevant for their tasks. This information (management) system should support three aspects of usage:

In this paper we concentrate on the first of these aspects: supporting the users in finding information. It is essential to acknowledge the relationship between

In [HD97] we have described how agents can be used to support the work processes and their activities. These agents contain knowledge about the goals of the process and the standard procedure to fulfill that goal. They also contain knowledge about which information is needed for each step in this standard procedure. Besides the knowledge about the standard procedure they contain a planning module that can be used to construct a plan to reach the goal of the process in those cases when the standard procedure cannot be followed. Here we combine these agents with agents that support the users in finding and receiving information. Thus we construct an information system that supports the enterprise-wide exchange of information.

Specifically, we propose the following cooperation between the process agents and the retrieval agents. When the process agent needs information to support the next step in a business-process it will not only send this request to the retrieval agent, but will also provide information about the context of this request. I.e. it will indicate the goal of the process and the role of the information in the activity to reach that goal. In this way the retrieval agents can build up a user-profile not only based on the word-usage of retrieved documents, but also based on the context in which the documents are used by the user. Thus our agents learn why certain documents are considered relevant by the user.

2. Background

The nature of information systems has changed over the past decade [KUW94]. Modern information systems are decentralised, autonomous and heterogeneous. This should be acknowledged in the development of new information systems as well. Specifically, this holds for modern organizations that act internationally. These global organizations have to deal with information sources of very different nature.

The modern organization is characterized by more and more non-repetitive work. This leads to the use of more knowledge at the workplace and thus to the need for tools to manage this knowledge. As, in general, the knowledge is distributed all over the organization and takes a large variety of forms, the user wants to be able to assess the quality of the information. The tools should assist the user to cope with the heterogeneous information based on this assessment of the quality of the information. They should therewith reduce the information overload of the user and only present information that is relevant and effective for the user.

Other approaches exist that try to filter the information based on a user-profile (e.g. Topic, Affinicast). However, these profiles (often based on common group browsing behaviour) are static in the sense that they are not adjusted for the specific task a user is performing. Also it is difficult to properly incorporate the dynamic and heterogeneous nature of the information in the user-profiles.

We claim that the information that is relevant and effective for the user depends heavily on the task which the user is performing at a certain moment. In our opinion the information that is needed by the user depends on situation related aspects such as his experience with the task at hand and whether the standard procedure for the task is followed or not.

The increasing popularity of Work Flow Management Systems (WFMS) supports our claim that it is important to look at the combination of both execution and management of work processes. In WFMS the communication between the users is not adequately supported. However, in modern organizations, where processes are more flexible and thus not very structured, this communication is of prime importance for a successful completion of the tasks.

In our approach we suggest to combine the task based approach of WFMS with communication based approaches to form a task based work activity coordination.

3. Task-Based Information Retrieval

In an environment like World Wide Web, but also in enterprise-wide information systems (e.g. Intranet solutions) in any medium to large sized organization, information is available on a wide variety of topics. The information comes from many different sources and is used by very different kinds of people. Both the menu-based systems like Yahoo and the huge search engines like Alta Vista and Excite are purely subject oriented. They try to meet the challenge of providing pointers to valuable documents, based on a search pattern which often consists of just a few keywords.

Many approaches exist to improve on this kind of search technique, by using information from more than just a single user query. Golovchinsky [G97a,G97b] assigns weights to search terms based on how many queries ago the search term was used. Queries in his system are actually hypertext links, not user-typed sets of keywords. Fishnet [BL97] is a tool, developed at the Eindhoven University of Technology, that is typical for agent-based retrieval tools that maintain a database of representations of previously returned accepted and rejected documents, in order to form a user model that represents the typical interest of the user. All these types of tools classify documents based on content.

In the approach that is proposed in this article we argue that there are (at least) three different functionality aspects that determine the effectiveness of the information system.

3.1 Task control

We argue that the above mentioned tools cannot provide satisfactory query results because whether a document is relevant or not cannot be easily determined based on a document's content. When a user asks for ``automobile repair'' a search engine will return documents with hobby repair instructions for various engine problems, detailed instructions for experienced car mechanics, help information on auto-body work, addresses of repairmen and shops, etc. Whether documents are relevant to the user depends on much more than just the subject of the document:

All these aspects are related to the specific role the user is playing within the work process. A factory organization supplies its shop floor workers with the proper material (parts, tools, etc.) based on the position of the workers in the production process. (E.g. the carpenter and the designer get different pencils.) In the same way an administrative organization must also supply its workers with the proper material (information, documents, etc.) that is suited for their role in the administrative process.

3.2 Information Retrieval

In order to realize this we propose an information retrieval system that uses:

This implies that the information retrieval and the process management are dealt with in a coherent and integrated fashion.

The information retrieval component should take into account that the data is not static but evolves over time (new information sources become available, etc.). Also the user-profile should not be considered as static. A user evolves from an inexperienced to an experienced user (and maybe from a disciplined to a sloppy one).

The retrieval tool should ask for user-feedback in order to learn what characterizes relevant documents and irrelevant ones. The difference with others is that in our proposal the user-feedback is not limited to a boolean "relevant/not relevant" selection, but it includes feedback on:

Some of this feedback can be given automatically by the task control component, while other aspects should be asked from the user or learned through experience. By discriminating documents according to these different criteria, the retrieval tool is able to better find information that is actually helpful for the user, in her job situation. It can also tie the different aspects together to form a more detailed classification of information, as is described in the next section.

3.3 Document management

This structured approach also gives a better tool to organize the information and document management process. Any standard factory organization invests in setting up the right support mechanism to facilitate the supply of material to its workers; the average administrative organization on the other hand does not properly acknowledge the different activities that should be involved in supplying the workers with the right documents. For example, document warehouse management is not something that can be completely left to automated systems (just as there are only exceptional cases in which fully automated hardware warehouses are feasible). Combining information retrieval with document management, gives, in our opinion, a solid base for an effective and efficient enterprise-wide information system.

3.4 Task-Based Filtering

The above considerations lead to a combination of information retrieval and task control integrated with a document management component. In this combination we have to take into account that

Taking into account this dynamicity of the environment in all three aspects leads to the suggestion to use three different types of agents that each take care of one aspect. The agents should cooperate closely to realize the goal of task-based filtering in a effective and efficient manner. The overall structure of the agent system can be respresented as follows:

4. A Cooperative Agent Architecture for Feedback

Altogether the architecture involves three types of cooperating agents:

Although the agents each have their own types of tasks they have the same structure (see [VD97,VDB97]) in which they have a knowledge base specific for the domain in which they operate and a communication component to facilitate cooperation with the other (types of) agents. The agents communicate using KQML (Knowledge and Query Manipulation Language).

4.1 Process agents

The process agents contain knowledge about the standard procedures to be followed for each task. They also contain knowledge about possible alternatives and how to construct alternatives in case of exceptional circumstances. The process agents use a goal oriented approach based on Action Workflow [MWF92] to establish this knowledge about the state of the process. In this paper we will not go into details on this aspect (see [HD97] for a thorough description).

Besides the structural knowledge about the task, the process agent also contains feedback knowledge about the (past) performance of a certain task. This knowledge is used to adjust the information requests and maybe also the user assignment to a task.

4.2 Retrieval agents

The retrieval agents should learn from the process agents about the state of the process, and therefore about the (business) purpose of supplying the information.

In the architecture that we propose here the learning retrieval agents add part of the document knowledge (meta-information) to their internal database (or user model). The retrieval agents learn through feedback of other agents as well as through feedback of the users on the relevance of the retrieved documents. The internal database of a retrieval agent serves three purposes:

The best possible use of a retrieval agent's database is the help it can provide to other agents. When an agent encounters a document for the first time, other agents may already have classified that document. Although these other agents work for users with different interests, different jobs and tasks an agent can use the judgement of other agents in better evaluating a newly found document.

4.3 Document agents

In order to find relevant information quickly it helps when documents contain meta information indicating their subject, intended audience and possibly other aspects. Unfortunately it is not possible to add meta information to external documents. So, we cannot assume that it is feasible to design and implement for every document an agent with knowledge about the document and its usage. What is feasible is that a user's retrieval agent is cooperating with a number of document (warehouse) agents that act as a kind of information brokers that know about the market place where documents are used (retrieved).

Although the different agents work for users with different interests, different jobs and tasks a retrieval agent can use the judgement of other agents in better evaluating a newly found document. We feel that summaries of this knowledge should be stored (learned) in special document agents dedicated to the management of the document warehouse. These document (warehouse) agents can offer the common knowledge about the documents and their usage, and they can (on the basis of this knowledge) proactively control the contents of the document warehouse.

Cooperating agents are only feasible within a single organization, and most likely also only at a single site (or geographically near sites). This implies that agents do not have to make transformations between location and organizational information. (If one user has taught her agent that a document is relevant to her location, this applies to the other users' agents as well.)

5. Conclusions and Future Work

All modern flexible organizations need support in searching, organizing and maintaining their bodies of information. A need that is increasing with the increasing heterogeneous nature of the information sources and higher need of integrated information for the knowledge workers. This paper focusses on the information retrieval aspect.

Information retrieval can be improved by separating knowledge about document content from the tasks a document is intended to support and the geographic location or organization it is aimed at.

By distinguishing three types of agents, process agents, document (warehouse) agents and retrieval agents, an organization can set up a retrieval process in which the necessary knowledge is adequately distributed. When these agents cooperate, just like people cooperate in the traditional factory inventory processes, retrieval can be much better supported in flexible medium or large sized business environments.

Retrieval agents that assist users in finding information can help each other both by providing their evaluation of specific documents and by proposing documents based on the knowledge of each other's user model.

Evaluation of documents can be further enhanced by including a document's environment into that evaluation. Other documents pointing to a document, as well as pointers from the document under evaluation may provide valuable information about the purpose of a document. Also, the navigation path taken by a user to reach the document may provide cues about the type of information the user is searching for. These additional cues are not yet incorporated in our cooperating agents architecture, but will be in the near future.

6. References

[G97a] G. Golovchinsky.
Roll-Your-Own Hypertext. In Proceedings of the Flexible Hypertext Workshop, Macquarie Computing Reports C/TR97-06, pp. 49-53, 1997.
[G97b] G. Golovchinsky.
What the Query Told the Link: The integration of hypertext and information retrieval, In Proceedings of the ACM Conference on Hypertext, pp. 67-74, 1997.
[BL97] P. De Bra and W. Lemmens.
FishNet: Finding and Maintaining Information on the Net. In Proceedings of the AACE WebNet Conference, Toronto, 1997.
[HD97] G.J. Houben and F. Dignum.
Information for organized work. In F. Baader, M. Jeusfeld, W. Nutt (eds), Proceedings 4th Int. workshop Knowledge Representation meets databases, Athens, pp. 81-86, 1997.
[KUW94] St. Kirn, R. Unland and U. Wanka.
Mamba: Automatic customization of computerized business processess. In Information Systems, 19(8), pp. 661-682, 1994.
[MWF92]
Raúl Medina-Mora, Terry Winograd, Rodrigo Flores, Fernando Flores, The Action Workflow Approach to Workflow Management Technology, In Computer-Supported Cooperative Work 92 Proceedings, 1992, pp. 281-288.
[VD97] E. Verharen and F. Dignum.
Cooperative Information Agents and Communication. In P. Kanzia and M. Klusch (Eds.) Cooperative Information Agents, (LNAI 1202), Springer Verlag, pp. 195-209, 1997.
[VDB97] E. Verharen, F. Dignum and S. Bos.
Implementation of a cooperative agent architecture based on the language-action perspective. In M. Singh et.al. (Eds.) Proceedings of ATAL-97, Providence, USA, 1997.