Documentation

Markup is an online annotation tool that can be used to transform unstructured documents into structured standoff format for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Setup

Quantity - The number of documents you intend to annotate (defaults to single).

Single Document Show GIF

Select Single Document GIF

Required

Document to annotate - The document you intend to annotate (must be .txt or .html file).

Configurations - Defines all entities and attributes available during annotation. Select an existing configuration file (must be .conf file that is formatted as shown here), or create a new one using the in-built config creator.

Optional

Existing annotations - A file containing existing annotations for the document you intend to annotate (must be .ann file in standoff format).

Ontology - A dictionary of terms, codes, and related data to be accessed during annotation.

  • Pre-loaded - Markup offers pre-loaded ontologies that can be used. Certain pre-loaded ontologies require external permissions (e.g. the Unified Medical Language System), for which you will need to login to authorise access.
  • Custom - You can provide a custom ontology by providing a text file, where each line is of the form [TERM][TAB][CODE], as shown below.

Focal Seizure RANDOM01938

Epilepsy RANDOM43904

Sodium Valproate RANDOM30921

...

Multiple Documents Show GIF

Select Multiple Documents GIF

Opening multiple documents streamlines the annotation process by enabling the navigation, annotation, and exportation of any number of documents during a single session.

Required

Folder to annotate - The folder containing the documents you wish to annotate. The folder must contain:

  • The documents you wish to annotate (must be .txt or .html files).
  • A configuration file which defines all entities and attributes available during annotation. Select an existing configuration file (must be .conf file which is formatted as shown here), or create a new one using the in-built config creator.
  • Any existing annotation files you wish to use (must be .ann file). The name of each annotation file must match the name of its corresponding text document (e.g. annotations for some-random-file.txt must be stored in some-random-file.ann).

Optional

Ontology - A dictionary of terms, codes, and related data to be accessed during annotation.

  • Pre-loaded - Markup offers pre-loaded ontologies that can be used. Certain pre-loaded ontologies require external permissions (e.g. the Unified Medical Language System), for which you will need to login to authorise access.
  • Custom - You can provide a custom ontology by providing a text file, where each line is of the form [TERM][TAB][CODE], as shown above.

Annotation Show GIF

Annotating a document GIF

To add an annotation, you must:

Upon adding an annotation, it will displayed both within the document text and on the annotation panel. Within the annotation panel, annotations will be grouped by their entity, and ordered based on the appearance within the document text.

Hovering over an annotation in either the document panel or annotation panel will highlight the annotation in both panels, and will display a tooltip containing information about the annotation (e.g. attributes).

Clicking an annotation within the annotation panel will display a dropdown that contains all information (e.g. attributes) that correspond with the annotation, along with options for editing and deleting the annotation.

Markup allows for unlimited annotation of a single region, thus enabling the capture of complex data.

Config Creator Show GIF

Config Creator GIF

Markup offers an in-built config creator that makes it easier to generate config files by visualising the relationships between entities and attributes.

Data Generation Show GIF

Generating Data GIF

Markup offers an in-built data generator for producing structured, synthetic data. This data can be used to train the annotation prediction model.

Predictive Annotation Suggestions Show GIF

Predicting annotations GIF

Markup functionality includes a novel system to predict and suggest complex annotation using Active and Sequence-to-Sequence learning models.

Upon opening or switching to a document, Markup will predict all annotations and display them in a collapsible drop-down within the annotation panel. Each annotation will be grouped and coloured based on its entity category, and ordered based on its appearance within the document text.

Each annotation suggestion will consist of:

Selecting an annotation suggestion will give the user the opportunity to review all details related to the annotation (e.g. attribute values) before accepting, editing or rejecting it, where:

The response to a suggestion from a user will feedback into the prediction models to improve future predictions. Model adjustments will only be made to the model local to the user, and thus will have no impact on predictions for other users.

Export

After completing the annotation process, the user can export all anontations by selecting the Export button in the configuraton panel. Exporting the annotations will produce an annotation file, with the .ann file extension, that contains all added annotations in standoff format.

Browsers recommend by default that annotations be saved to the local Download folder. When annotating large quantities of documents, it can be useful to update this default location, which can be done on various browsers, as shown in the following guides: Chrome, Firefox, Opera, Safari.