FileNet Content Manager: Dos and Don'ts in Usage – Part 2

Share post via

IBM FileNet Content Manager is a very stable system on the market. And there are good reasons for that! In this blog article, we clear up further misunderstandings and shed light on the full-text index, FileStores, and virus scanner.

Many of our customers use IBM FileNet Content Manager for good reasons. To ensure that they are in control of their documents and not the other way around, they have introduced this tool as the central document management system in their company.

However, some FileNet users are not entirely satisfied with their tool. Why is that? We investigated the reasons and have outlined two issues that concern our customers in this blog post: the full-text index and FileStore and its problems with virus scanners. It quickly becomes clear that the supposed problems are quite natural or can be solved with a few simple tricks.

By the way: There is already an initial blog post on the topic of "Dos and Don'ts when working with FileNet Content Manager." You can read it here.

FileNet Content Manager: Working with the full-text index

The first major topic of this blog post is the full-text index. Here are a few introductory words on this subject: The optional ability to make document collections accessible via a full-text search is basically a great thing. However, there are a handful of limitations to keep in mind.

accuracy

A full-text index alone does not guarantee that all relevant documents will be found—not even in IBM FileNet Content Manager.

The full-text index is, by its very nature, prone to errors. Example: a dedicated attribute for classifying vehicle types contains values such as "car," "truck," or "bus." In a continuous text, the term "passenger car" could just as easily refer to "car," "vehicle," "motor vehicle," or even "rust bucket." Here, the definition of synonyms can be used to counteract this to a certain extent.

However, it quickly becomes clear that the result of a full-text search will always be only an approximation and never truly accurate.

Date of indexing

Indexing in IBM FileNet Content Manager is performed asynchronously.

This is useful when considering scenarios involving mass storage. The supplying systems receive a receipt for each document stored. There is no need to wait for this receipt until the full-text index has been built; it is delivered the moment the document arrives on the storage medium and is registered in the database. From this point on, a search using the supplied index values is consistently possible, taking into account the assigned permissions. Depending on the load on the system, it may take a while before the document can also be found via its full-text content.

Therefore, we recommend using the full text only as an addition to the search option via attributes.

Combined searches using attributes and full-text keywords

Even in the standard version of IBM FileNet Content Manager, it is possible to configure a search mask that works with both attributes and full-text keywords. The backend first sends the search query to the database using the attributes. The resulting hit list is then transferred to the full-text search. This sequence ensures that the keywords searched for are only searched for in documents whose attributes match the desired conditions.
This serves to optimize performance.

It is also technically possible to configure a search for full-text keywords only. However, since the full text does not contain any authorization information, it is possible that a document cannot be found even though it is available in the inventory and correctly indexed. How can this happen?

To avoid long waiting times for users, even with large data sets, the size of the hit list for a full-text query is limited. Since the full-text index does not contain any authorization information, this gross hit list must still be run through the authorization filter to ensure that users are not shown any documents for which they do not have at least read access.

This risk is particularly high when searching for trivial, frequently occurring words in a large database. In other words, full-text searches also need to be learned.

Not all file formats are indexed

The full-text engine works with a filter for file formats. Put simply, images are not indexed, for example. This also applies if the TIFF tags are filled with text content or the EXIF data of a JPEG file contains keywords. Data in "exotic" formats such as AFP data streams is also ignored. These are encoded according to the EBCDIC standard and are not recognized.

There is an elegant workaround for such use cases. To do this, the desired text content must be extracted in an upstream processing step and saved in a separate text file. The loading program transfers this text file to the FileNet system as a second content element. This trick fills the full-text index and the user interface returns the correct element. In the standard system, this is always the first content element of a document.

What exactly is indexed?

In projects, we always hear the desire for a "Google-like" search in IBM FileNet Content Manager that is as simple as possible.

For the reasons mentioned above, this is not practical for larger document collections. However, if you only have a few thousand contract documents, there is a trick you can use to get very close to this goal. It is possible to transfer the technical attributes of a document (provided they are of the text type) to the full text in addition to the document content with a single mouse click in the attribute definition. This allows you to search attribute values and text content with just a single search field.

FileNet Content Manager: FileStores and virus scanners

In the second major topic of this blog post, we address an issue for administrators: file stores and virus scanners. An up-to-date and active virus scanner is an indispensable must in professional IT. A consistent antivirus concept from client to server makes sense.

The element of surprise in migrations

Unpleasant surprises can occur if different virus scanners or different parameters for the same virus scanner are used on the systems involved.

In this specific case, data loss may occur during migration from a Linux to a Windows system. A document that appeared to be harmless to the virus scanner used on the Linux system was transferred to a Windows-based FileNet system. Here, it is necessary to take a close look at the sequence of technical storage.

The order of technical filing

Step 1

is the creation of a new data record, including the storage of attribute values.

Step 2

is the transfer of content, including its storage on the file system.

result

After these two steps, the document storage system sends a success message. A fraction of a second later, the virus scanner examines the new arrival and determines that it may be a virus-infected file. As a diligent representative of its kind, the virus scanner deletes the document.

When a user accesses this document, they will find it using the attribute search. However, any attempt to access the content will result in an error message, as the desired content element is no longer available in the FileStore.

This unfortunate situation would also have been detected by a regularly performed FileNet consistency check.

We look forward to your "aha" moments

Do you use the full-text function in your project? Are you satisfied with your search options? Has the virus scanner ever thwarted your plans? Let us know. We look forward to a lively discussion.

About ISR

Since 1993, we have been operating as IT consultants for Data Analytics and Document Logistics, focusing on data management and process automation.
We provide comprehensive support, from strategic IT consulting to specific implementations and solutions, all the way to IT operations, within the framework of holistic Enterprise Information Management (EIM).
ISR is part of the CENIT EIM Group.

Visit us virtually on these channels:

News Categories
News Archive

Latest Publications

Upcoming ISR Events

[tribe_events_list limit=”3″]