semantic_html.parser

Functions

parse_note(→ dict)

Parses a HTML note HTML string into a JSON-LD dictionary (optionally also annotated HTML).

Module Contents

semantic_html.parser.parse_note(html: str, mapping: dict, note_uri: str = None, metadata: dict = None, rdfa: bool = False, wadm: bool = False, remove_empty_tags: bool = True) dict

Parses a HTML note HTML string into a JSON-LD dictionary (optionally also annotated HTML).

Parameters:
  • html (str) – The HTML content of the HTML note.

  • mapping (dict) – A dictionary mapping classes, tags, styles, and types.

  • note_uri (str, optional) – If provided, used as the Note’s @id. Can also be a key in mapping dict.

  • metadata (dict, optional) – A dictionary with additional keys to append for each item (e.g. provenance information) Can also be set as dict ‘metadata’ in mapping.

  • rdfa (bool, optional) – If True, also return RDFa-annotated HTML.

  • wadm (bool, optional) – If True, also return Web Annotation Data Model conformant JSON-LD.

  • remove_empty_tags (bool, optional) – If True, empty tags will be removed from HTML before parsing.

Returns:

dict with keys for JSON-LD, WADM, and RDFa.

Return type:

dict