Cheshire3 Object Model - Parser

API

class cheshire3.baseObjects.Parser(session, config, parent=None)[source]

A Parser takes a Document and parses it to a Record.

Parsers could be viewed as Record Factories. They take a Document containing some data and produce the equivalent Record.

Often a simple wrapper around an XML parser, however implementations also exist for various types of RDF data.

process_document(session, doc)[source]

Take a Document, parse it and return a Record object.

Implementations

The following implementations are included in the distribution by default:

class cheshire3.parser.MinidomParser(session, config, parent=None)[source]

Use default Python Minidom implementation to parse document.

class cheshire3.parser.SaxParser(session, config, parent)[source]

Default SAX based parser. Creates SaxRecord.

class cheshire3.parser.StoredSaxParser(session, config, parent=None)[source]
class cheshire3.parser.LxmlParser(session, config, parent)[source]

lxml based Parser. Creates LxmlRecords

class cheshire3.parser.LxmlHtmlParser(session, config, parent)[source]

lxml based parser for HTML documents.

class cheshire3.parser.PassThroughParser(session, config, parent=None)[source]

Take a Document that already contains parsed data and return a Record.

Copy the data from a document (eg list of sax events or a dom tree) into an appropriate record object.

class cheshire3.parser.MarcParser(session, config, parent=None)[source]

Creates MarcRecords which fake the Record API for Marc.