Matching native, imaged, and searchable content files
The Ringtail load file .mdb structure allows you to use a document’s ID to import a document’s native source file, any imaged pages of that document, or any searchable content file extracted from the native file or imaged pages, such as an optical character recognition (OCR) text file.
Note: To successfully import these files, use the same Document_ID when naming these files.
You do not need to include information about the content or searchable files for a document in the.mdb file. Name the content file after the Document_ID and keep the files in the same location (level) as all other files for the document. The content file extension must be a searchable file type. You can use any searchable file type as a searchable content file. Also, a single document can have more than one type of searchable content file, as long as the files have different file extensions, for example, doc_id.txt and doc_id.html.
To make a native source file available in Image viewer, include it in the pages table, as follows:
Use the Document_ID for the native file’s file name, with the correct extension for that file type, for example, document ID.xls or document ID.doc
Set the num_pages field to 1.
Reference the file from the Image_File_Name field in the pages table.
Note: The content file will be the same file referenced in the pages table if the pages table conforms to the content file-naming convention.
Page flags in content files
When creating searchable content files, you can optionally insert a page separator flag to identify page boundaries within the text. This creates links to the exact document page when viewing the content in Nuix Discover. However, without this information, the application treats the entire searchable content file as if it were all extracted from a single imaged page.
The pattern for the page separator flag is as follows:
Three hash marks
The ordinal page number
Three pipe characters, the word Page
The page label, as specified in the .mdb pages table
Three carats
Example: ###1|||Page SO-0015776^^^