Cheshire3 Object Model - Document

API

class cheshire3.baseObjects.Document(data, creator='', history=, []mimeType='', parent=None, filename='', tagName='', byteCount=0, byteOffset=0, wordCount=0)[source]

A Document is a wrapper for raw data and its metadata.

A Document is the raw data which will become a Record. It may be processed into a Record by a Parser, or into another Document type by a PreParser. Documents might be stored in a DocumentStore, if necessary, but can generally be discarded. Documents may be anything from a JPG file, to an unparsed XML file, to a string containing a URL. This allows for future compatability with new formats, as they may be incorporated into the system by implementing a Document type and a PreParser.

get_raw(session)[source]

Return the raw data associated with this document.

Implementations

The following implementations are included in the distribution by default:

class cheshire3.document.StringDocument(data, creator='', history=, []mimeType='', parent=None, filename=None, tagName='', byteCount=0, byteOffset=0, wordCount=0)[source]