6 Principles in post-editing

Many of the errors in automatic tagging were found in areas where a grammarian has difficulties in drawing a borderline, e.g. between participles and adjectives for -ed forms; between nouns, adjectives, and participles for -ing forms; between conjunction and preposition for as; conjunction and adverb for so; etc. The main difficulty in post-editing was to achieve a reasonable degree of consistency in such cases. Some general principles in post-editing have been:

  1. to keep a tag assigned by the automatic tagging programs unless there are good reasons against it (the 'follow-the-tagger principle');
  2. in cases where a change is necessary, to use classification criteria which can be applied as simply and consistently as possible (the 'consistency principle');
  3. in cases of doubt, to give each word its most 'normal' tag, e.g. NNS rather than NN for means (the 'normalcy principle').

While an attempt has been made to find a classification which is linguistically justifiable, this has not always been possible. For one thing, this would have meant tackling grammatical problems which are still awaiting a solution. A particular problem has been that we have chosen to draw a borderline and assign a single tag for each occurrence of a word, though we know that gradience and fuzzy borderlines are characteristic of language (cf Johansson 1985). In the sections below we shall draw attention to some problematic areas.8



Page last update 20. May -98
Anne Lindebjerg