docria.codec¶
Codecs, encoding/decoding documents to/from binary or text representations
Classes
|
Utility methods for all codecs |
JSON codec |
|
MessagePack document codec |
|
|
MessagePack Document, allows partial decoding |
|
Embeddable document as a extended type |
|
XML Codec, only encoding support |
Exceptions
|
Serialization/Deserialization failure |
Codecs, encoding/decoding documents to/from binary or text representations
- class docria.codec.Codec[source]¶
Utility methods for all codecs
- class docria.codec.MsgpackCodec[source]¶
MessagePack document codec
- static compute_text_offsets(doc, texts)[source]¶
Computes all offsets and inserts text into document
- class docria.codec.MsgpackDocument(data_or_document, ref=None)[source]¶
MessagePack Document, allows partial decoding
- Example
>>> from docria.model import Document, DataTypes as T, Node >>> from docria.codec import MsgpackDocument >>> >>> doc = Document() >>> tokens = doc.add_layer("token", pos=T.string) >>> node = Node(pos="NN") >>> tokens.add_many([ node ]) >>> >>> # Convert document to msgpack encoded binary data >>> msgdoc = MsgpackDocument(doc) >>> bytes_data = msgdoc.binary() # type: bytes >>> >>> # Convert from msgpack encoded binary data to document >>> newdoc = MsgpackDocument(bytes_data) >>> doc = newdoc.document()
- class docria.codec.XmlCodec[source]¶
XML Codec, only encoding support
- static encode_intermediate(doc, **kwargs)[source]¶
Conversion of docria document into an intermediate form: texts, schema and layer data.
- Parameters
doc – docria document
kwargs – options for compile
- Returns
- static encode_tree(doc, verbose=False, verbose_node_spans=False, document_id='', **kwargs)[source]¶
Encodes a docria document into an XML representation.
- Parameters
doc (
Document
) – docria documentverbose – add extra attributes to the XML data for readability and simpler tooling
verbose_node_spans – add extra nodes for each node, materializing the span for readability
document_id – the global unique document id
kwargs – additional optoins, see XmlCodec.encode_intermediate for options
- Return type
ElementTree
- Returns