[Corpora-List] LUCY Corpus available

From: Geoff Sampson (grs2@cogs.susx.ac.uk)
Date: Mon Nov 24 2003 - 14:03:20 MET

  • Next message: Yuri Tambovtsev: "[Corpora-List] Kamassin is a dead Samoyedic language"

    The initial release of the LUCY Corpus is now freely available for downloading.
    The LUCY Corpus is a treebank sampling modern written British English of
    three genres:

    * edited published prose

    * the writing of young adults (e.g. A-level exam scripts, 1st-year
       undergraduate essays)
       
    * spontaneous writing by 9- to 12-year-old children

    Compilation of the LUCY Corpus was sponsored by the Economic and Social
    Research Council (UK). The corpus is named after St Lucia or Lucy, patron
    saint of writers.

    The corpus is structurally annotated in conformity with the SUSANNE annotation
    scheme, defined in my _English for the Computer_ (Clarendon, 1995).
    Extensions to the scheme were developed in the LUCY project in order to
    represent what is going on in cases where unskilled writers fail to produce
    written structures that succeed in expressing their apparent intention.

    Documentation for the LUCY Corpus, including a definition of the annotation
    conventions just mentioned, can be read as a Web page at
    www.grsampson.net/LucyDoc.html (13,000 words). The Corpus itself is
    available via www.grsampson.net/Resources.html, as are earlier resources from
    my stable.

    The initial LUCY release will undoubtedly contain mistakes. (That is
    particularly likely, since pressure from the sponsor for early
    publication meant that there was not enough time for all the checks that
    would ideally have been applied.) Users who find errors are warmly urged
    to contact me with details, which will be used to produce later, more
    accurate releases. My e-mail address, in a form designed to foil spammers,
    is grs2 followed by at-sign followed by sussex.ac.uk

    Geoffrey Sampson MA PhD MBCS
    Professor of Natural Language Computing

    Department of Informatics
    University of Sussex
    Falmer, Brighton BN1 9QH, England

    t +44 1273 678525
    f +44 1273 671320
    w www.grsampson.net

    e-mail address no longer shown to avoid spam flood



    This archive was generated by hypermail 2b29 : Thu Nov 27 2003 - 10:14:45 MET