Cheshire3 Object Model - TokenMerger

API

class cheshire3.baseObjects.TokenMerger(session, config, parent=None)[source]

A TokenMerger merges identical tokens and returns a hash.

A TokenMerger takes an ordered list of tokens (i.e. as produced by a Tokenizer) and merges them into a hash. This might involve merging multiple tokens per key, while maintaining frequency, proximity information etc.

One or more Normalizers may occur in the processing chain between a Tokenizer and TokenMerger in order to reduce dimensionality of terms.

process_hash(session, data)[source]

Merge and return tokens found in a hash.

process_string(session, data)[source]

Merge and return tokens found in a raw string.

Implementations

The following implementations are included in the distribution by default:

class cheshire3.tokenMerger.SimpleTokenMerger(session, config, parent=None)[source]
class cheshire3.tokenMerger.ProximityTokenMerger(session, config, parent=None)[source]
class cheshire3.tokenMerger.OffsetProximityTokenMerger(session, config, parent=None)[source]
class cheshire3.tokenMerger.RangeTokenMerger(session, config, parent)[source]
class cheshire3.tokenMerger.SequenceRangeTokenMerger(session, config, parent)[source]

Merges tokens into a range for use in RangeIndexes.

Assumes that we’ve tokenized a single value into pairs, which need to be concatenated into ranges.

class cheshire3.tokenMerger.MinMaxRangeTokenMerger(session, config, parent)[source]

Merges tokens into a range for use in RangeIndexes.

Uses a forward slash (/) as the interval designator after ISO 8601.

class cheshire3.tokenMerger.NGramTokenMerger(session, config, parent)[source]
class cheshire3.tokenMerger.ReconstructTokenMerger(session, config, parent=None)[source]
class cheshire3.tokenMerger.PhraseTokenMerger(session, config, parent)[source]