semantic_html.models

Attributes

DEFAULT_CONTEXT

WADM_CONTEXT

Classes

BaseGraphItem

Base class for all graph items with standardized fields.

NoteItem

Base class for all graph items with standardized fields.

StructureItem

Base class for all graph items with standardized fields.

LocatorItem

Base class for all graph items with standardized fields.

DocItem

Base class for all graph items with standardized fields.

AnnotationItem

Base class for all graph items with standardized fields.

QuotationItem

Base class for all graph items with standardized fields.

Functions

generate_wadm_annotation(item)

build_tei_from_items(base_items)

wadm_to_conll(wadm[, config, jsonld])

Convert WADM annotations into CoNLL format.

Module Contents

semantic_html.models.DEFAULT_CONTEXT
semantic_html.models.WADM_CONTEXT = 'https://www.w3.org/ns/anno.jsonld'
class semantic_html.models.BaseGraphItem(type_, text=None, metadata=None, selector=None, **kwargs)

Base class for all graph items with standardized fields.

data
selector
wadm_metadata
to_dict()

Return the graph item as a dictionary.

to_wadm()

Return a WADM-conformant dictionary representation.

class semantic_html.models.NoteItem(text, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

class semantic_html.models.StructureItem(text, level, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

class semantic_html.models.LocatorItem(text, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

class semantic_html.models.DocItem(text, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

class semantic_html.models.AnnotationItem(text, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

class semantic_html.models.QuotationItem(text, **kwargs)

Bases: BaseGraphItem

Base class for all graph items with standardized fields.

semantic_html.models.generate_wadm_annotation(item)
semantic_html.models.build_tei_from_items(base_items: list[BaseGraphItem])
semantic_html.models.wadm_to_conll(wadm, config: dict = None, jsonld: dict = None)

Convert WADM annotations into CoNLL format. - wadm: dict with ‘text’ + ‘annotations’, OR list of annotations - jsonld: optional ground-truth JSON-LD, used to resolve source->text - config: options (max_span_tokens, whitelist, blacklist, type_whitelist) Returns: CoNLL string