DiAnnotator is a dialogue annotation tool. It is meant to reduce the need to use the mouse to annotate dialogues and to improve keyboard-only annotation speed and reliability. DiAnnotator can be used to segment utterances, to apply dialogue act or sentiment-analysis labels, to link text segments and to modify taxonomies. DiAnnotator is fully multi-layer, and therefore supports annotation schemes such as DAMSL and ISO 24617-2.
DiAnnotator is written in Python 3.4.3.
The following modules are required to run the program:
nltk
ttkthemes
undo
Simply run the Python script launcher.py
in the src
folder.
Run the build.sh
script at the root of the directory. A bin
folder containing an executable will be created at the root of the project directory.
The file must contain a header line and as many additional lines as there are text segments. Columns must be separated by tabs, not commas, and there should be no text delimiter.
The following columns are used when importing data:
Contains a string representation of the date and time on which the message was sent. The string must respect the international standard date and time notation to be parsed correctly.
Contains the raw text of the message.
Contains the text segment. When there are more than one segment in a message (due to prior segmentation for example), the raw
column should only be filled in the first row, and left empty in the following rows. Moreover, when there are more than one raw message for a single segment (due to prior merging for example), the segment
column should be filled only in the first row, and left empty in the following ones.
Contains the name or ID of the speaker.
Contains a note or comment pertaining to the segment.
Columns bearing a layer name are used to load legacy annotations, which can serve as useful hints when producing new annotations. These column's names must follow the naming convention of the taxonomy's JSON file.
Columns bearing a layer name and suffixed by -value
are used to load legacy qualifiers for that layer. For example, if emotion
is a layer, express
may be a label, and happiness
might be the value present in the emotion-value
column. These column's names must follow the naming convention of the taxonomy's JSON file.
Contains a unique identifier for the segment. This column is optional but is required for link importation.
Contains the list of links emanating from this segment, their types and their targets. The link format is the following: <target_id>-<link_type>,<target_id>-<link_type>
.
participant | datetime | segment | raw | activity | social | feedback | feedback-value |
---|---|---|---|---|---|---|---|
manu | 11-05-17 13:05 | hi guys! | hi guys! does anyone know how to install a proprietary graphics card driver? | greet | |||
manu | 11-05-17 13:05 | does anyone know how to install a proprietary graphics card driver? | question | ||||
gabi | 11-05-17 13:06 | search software sources, and then it's in the additional drivers tab | search software sources, and then it's in the additional drivers tab | answer | |||
manu | 11-05-17 13:07 | k thx! | k thx! | thanks | acknowledge | positive |
The file must contain an ordered list of dictionaries, each dictionary representing a single segment.
Most fields are the same as for CSV data, with a few exceptions.
The segment dictionary must not contain fields for each layer, instead there must be a field annotations
that contains a dictionary. Each key of that dictionary is a layer and each value is another dictionary, with a label
field containing the label of the segment for the layer and optionally a qualifier
field containing the qualifier for the layer.
Links must be represented as a dictionary in the optional links
field of the segment dictionary. Each key of that dictionary represents a link type, and the corresponding value must be the list of identifiers of the target segments. The id
field must be set for all segments for links to be imported correctly.
{
"id": 76,
"segment": "Hello Gix,",
"raw": "Hello Gix, how are you doing?",
"participant": "Poggy",
"datetime": "04-11-17 22:17",
"note": "performed split here",
"links": {
"rhetorical relation": [
75
]
},
"annotations": {
"social obligations management": {
"label": "greet"
}
}
}
When a data file is loaded, a taxonomy must be chosen before annotation can begin. Taxonomies are saved in JSON format.
The taxonomy fields are:
The taxonomy's name.
The URL to the taxonomy's website or documentation.
The default layer of the taxonomy, the first one to be active when first loading a data file.
A dictionary of lists, whose keys represent layer names and the lists' elements represent the layers' labels' tagsets.
A dictionary of lists, whose keys represent layer names and the lists' elements represent the layers' qualifiers' tagsets.
A dictionary whose keys represent the different link types and whose values represent the colors in which the links should be displayed.
A dictionary, whose keys represent layer names and whose elements are hexadecimal color codes used for displaying labels. The colors
field is mandatory but the dictionary may be left empty, in which case labels will be displayed in a randomly generated color.
{
"name": "J-22 Tax",
"url": "www.somewhere-university.edu/j22tax",
"default": "Task",
"labels": {
"Task": [
"Inform",
"Confirm",
"Disconfirm",
"Commit",
"Offer",
"Instruct",
"Suggest",
"Request Information",
"Request Directives"
],
"Communication": [
"Correct",
"Completion",
],
"Other": [
"Announce",
"Preclose",
"Switch Topic",
],
},
"qualifiers": {
"Communication": [
"Perception",
"Interpretation",
"Evaluation"
]
},
"links": {
"Feedback": "#A6E22E",
"Functional": "#54D6EF",
"Rhetoric": "#ffffff"
},
"colors": {
"Task": "#FFFFFF",
"Communication": "#DB5CD7",
"Other": "#DB5CD7"
}
}
Black buttons on the bottom of the screen show the possible labels for the active layer. The active segment is the last one displayed, appearing in bold. To apply a label to a segment, click on the appropriate button or type a sufficiently discriminating part of the label (for example, req dir
for request directives
) then press Enter.
Changes the active layer.
Removes label and qualifier from the active segment.
If the active segment is not linked to any other segment, links it to a specific segment, selected by index after selecting the link type. If the segment is already linked to another segment of the selected link type, the link is removed.
Removes all links emanating from the active segment.
Splits the active segment in two, on the chosen token. Links, notes, annotations and legacy annotations are preserved.
Merges the active segment to the previous ones. Links, notes, annotations and legacy annotations are preserved.
Adds a new layer, label, qualifier or link type.
Updates the name of the layer, label, qualifier or link type used for the active segment. All segments annotated with this layer, label, qualifier or link type will be affected.
Jumps to a specific segment, selected by index.
Filters the collection by label, legacy label, layer, legacy layer, qualifier or legacy qualifier.
The next entry creates a note and attaches it to the active segment. If the active segment already has a note attached, the note is deleted.
Sends an entry, pushes a button on which the focus is set, or moves on to the next segment if the entry field is empty.
Moves down one segment.
Moves up one segment.
Moves down ten segments.
Moves up ten segments.
Deletes the active segment.
Toggles legacy annotations display.
Randomizes participant colors.
Filters segments by active layer.
Filters segments by active label.
Filters segments by active qualifier.
Toggles between fullscreen and windowed mode.
Exits the application.
Increases text font size.
Decreases text font size.
Moves focus from button to button, and back to the entry field.
Closes the current file.
Opens the "open file" dialogue.
Opens the "save as..." dialogue.
Opens the "import file" dialogue.
Opens the "export as..." dialogue.
Opens the "import taxonomy" dialogue.
Opens the "export taxonomy as..." dialogue.