Cheshire3 Tutorials - Configuring Stores

Introduction

There are several trpyes of Store objects, but we’re currently primarily concerned with RecordStores. DocumentStores are practically identical to RecordStores in terms of configuration, so we’ll talk about the two together.

Database specific stores will be included in the <subConfigs> section of a database configuration file [Database Config Tutorial, Database Config Reference].

Example

Example store configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<subConfig type="recordStore" id="eadRecordStore">
    <objectType>recordStore.BdbRecordStore</objectType>
    <paths>
        <path type="databasePath">recordStore.bdb</path>
        <object type="idNormalizer" ref="StringIntNormalizer"/>
    </paths>
    <options>
        <setting type="digest">sha</setting>
    </options>
</subConfig>

Explanation

Line 1 starts a new RecordStore configuration for an object with the identifier eadRecordStore.

Line 2 declares that it should be instantiated with <objectType> cheshire3.recordStore.BdbRecordStore. There are several possible classes distributed with Cheshire3, another is cheshire3.sql.recordStore.PostgresRecordStore which will maintain the data and associated metadata in a PostgreSQL relational database (this assumes that you installed Cheshire3 with the optional sql features enabled - see /install for details). The default is the much faster BerkeleyDB based store.

Then we have two fields wrapped in the :ref:config-paths section. Line 4 gives the filename of the database to use, in this case recordStore.bdb. Remember that this will be relative to the current defaultPath.

Line 5 has a reference to a Normalizer object – this is used to turn the Record identifiers into something appropriate for the underlying storage system. In this case, it turns integers into strings (as Berkeley DB only has string keys.) It’s safest to leave this alone, unless you know that you’re always going to assign string based identifiers before storing Records.

Line 8 has a setting called digest. This will configure the RecordStore to maintain a checksum for each Record to ensure that it remains unique within the store. There are two checksum algorithms available at the moment, ‘sha’ and ‘md5’. If left out, the store will be slightly faster, but allow (potentially inadvertant) duplicate records.

There are some additional possible objects that can be referenced in the <paths> section not shown here:

inTransformer

A Transformer to run the Record through in order to transform (serialize) it for storing.

If configured, this takes priority over inWorkflow which will be ignored.

If not configured reverts to inWorkflow.

outParser

A Parser to run the stored data through in order to parse (deserialize) it back into a Record.

If configured, this takes priority over outWorkflow which will be ignored.

If not configured reverts to outWorkflow.

inWorkflow

A Workflow to run the Record through in order to transform (serialize) it for storing.

The use of a Workflow rather than a Transformer enables chaining of objects, e.g. a XmlTransformer to serialize the Record to XML, followed by a GzipPreParser to compress the XML before storing on disk. In this case one would need to configure an outWorkflow to reverse the process.

If not configured a Record will be serialized using its method, get_xml(session)().

outWorkflow

A Workflow to run the stored data through in order to turn it back into a Record.

The use of a Workflow rather than a Parser enables chaining of objects, e.g. a GunzipPreParser to decompress the data back to XML, followed by a LxmlParser to parse (deserialize) the XML back into a Record.

If not configured, the raw XML data will be parsed (deserialized) using a LxmlParser, if it can be got from the Server, otherwise a BSLxmlParser.

DocumentStores

For DocumentStores, instead all we would change would be the identifier, the <objectType>, and probably the databasePath. Everything else can remain pretty much the same. DocumentStores have slightly different additional objects that can be referenced in the paths section however:

inPreParser

A PreParser to run the Document through before storing its content.

For example a GzipPreParser to compress the Document content before storing on disk. In this case one would need to configure a GunzipPreParser as the outPreParser.

If configured, this takes priority over inWorkflow which will be ignored.

If not configured reverts to inWorkflow.

outPreParser

A PreParser to run the stored data through before returning the Document.

For example a GunzipPreParser to decompress the data from the disk to trun it back into the original Document content.

If configured, this takes priority over outWorkflow which will be ignored.

If not configured reverts to outWorkflow.