Cheshire3 Object Model - Extractor

API

class cheshire3.baseObjects.Extractor(session, config, parent=None)[source]

An Extractor takes selected data and returns extracted values.

An Extractor is a processing object called by an Index with the value returned by a Selector, and extracts the values into an appropriate data structure (a dictionary/hash/associative array).

Example Extractors might extract all text from within a DOM node / etree Element, or select all text that occurs between a pair of selected DOM nodes / etree Elements.

Extractors must also be used on the query terms to apply the same keyword processing rules, for example.

process_eventList(session, data)[source]

Process a list of SAX events serialized in C3 internal format.

process_node(session, data)[source]

Process a DOM node.

process_string(session, data)[source]

Process and return the value of a raw string.

e.g. from an attribute value or the query.

process_xpathResult(session, data)[source]

Process the result of an XPath expression.

Convenience function to wrap the other process_* functions and do type checking.

Implementations

The following implementations are included in the distribution by default:

class cheshire3.extractor.SimpleExtractor(session, config, parent)[source]

Base extractor, extracts exact text.

class cheshire3.extractor.TeiExtractor(session, config, parent)[source]
class cheshire3.extractor.SpanXPathExtractor(session, config, parent)[source]

Select all text that occurs between a pair of selections.