ICAME CORPUS COLLECTION - INFORMATION

LOB Corpus, tagged version, horizontal format


A running text where each word is followed immediately by a word-class tag (number of different tags: 134).

Example:
A01   2 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**' ._. 
A01   3 ^ by_IN Trevor_NP Williams_NP ._. 
A01   4    ^ a_AT move_NN to_TO stop_VB \0Mr_NPT Gaitskell_NP from_IN
A01   4 nominating_VBG any_DTI more_AP labour_NN 
A01   5 life_NN peers_NNS is_BEZ to_TO be_BE made_VBN at_IN a_AT meeting_NN 
A01   5 of_IN labour_NN \0MPs_NPTS tomorrow_NR ._. 
A01   6    ^ \0Mr_NPT Michael_NP Foot_NP has_HVZ put_VBN down_RP a_AT
A01   6 resolution_NN on_IN the_ATI subject_NN and_CC 
A01   7 he_PP3A is_BEZ to_TO be_BE backed_VBN by_IN \0Mr_NPT Will_NP 
A01   7 Griffiths_NP ,_, \0MP_NPT for_IN Manchester_NP 
A01   8 Exchange_NP ._. 
 

LOB Corpus, tagged version, vertical format


Each word is on a separate line, together with its tag, a reference number, and some additional information (indicating whether the word is part of a heading, a naming expression, a quotation, etc).

Example:
A01 2 001 ----- --------------------------------------   
A01 2 002 *'    *'       H   
A01 2 010 VB    stop                           H 
A01 2 020 VBG   electing                       H 
A01 2 030 NN    life                           H 
A01 2 040 NNS   peers                          H 
A01 2 041 **'   **'                            H 
A01 2 042 .     .        H    @  
A01 3 001 ----- --------------------------------------   
A01 3 010 IN    by       H   
A01 3 020 NP    Trevor                         H 
A01 3 030 NP    Williams                       H 
A01 3 031 .     .        H    @  
A01 4 001 ----- --------------------------------------   
A01 4 010 AT    a               P    
A01 4 020 NN    move 
A01 4 030 TO    to   
A01 4 040 VB    stop 
A01 4 050 NPT   \0Mr                      \0 
A01 4 060 NP    Gaitskell    
A01 4 070 IN    from 
A01 4 080 VBG   nominating   
A01 4 090 DTI   any  
A01 4 100 AP    more 
A01 4 110 NN    labour                          N    
A01 5 010 NN    life 
A01 5 020 NNS   peers                           N    
A01 5 030 BEZ   is   
A01 5 040 TO    to   
A01 5 050 BE    be   
A01 5 060 VBN   made 
A01 5 070 IN    at   
A01 5 080 AT    a    
A01 5 090 NN    meeting  
A01 5 100 IN    of   
A01 5 110 NN    labour                          N    
A01 5 120 NPTS  \0MPs                     \0 
A01 5 140 NR    tomorrow 
A01 5 141 .     .    
 

LOB Corpus, WordCruncher version


This is an indexed version of the tagged LOB Corpus (horizontal format). It can only be used with WordCruncher.

Example:
|CA:Press:reportage 
|SA01 
|P1 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**' ._. 
|P2 ^ by_IN Trevor_NP Williams_NP ._. 
|P3    ^ a_AT move_NN to_TO stop_VB Mr\_NPT Gaitskell_NP from_IN
nominating_VBG any_DTI more_AP labour_NN life_NN peers_NNS is_BEZ to_TO be_BE 
made_VBN at_IN a_AT meeting_NN of_IN labour_NN MPs\_NPTS tomorrow_NR ._. 
|P4    ^ Mr\_NPT Michael_NP Foot_NP has_HVZ put_VBN down_RP a_AT resolution_NN 
on_IN the_ATI subject_NN and_CC he_PP3A is_BEZ to_TO be_BE backed_VBN by_IN 
Mr\_NPT Will_NP Griffiths_NP ,_, MP\_NPT for_IN Manchester_NP Exchange_NP ._. 
 

 

LOB Corpus, untagged version, text


The LOB Corpus is a British English counterpart of the Brown Corpus. It contains approximately a million words of printed text (500 text samples of about 2,000 words). The text of the LOB Corpus is not available on microfiche.

Example:
A01   1 **[001 TEXT A01**] 
A01   2 *<<*'*7STOP ELECTING LIFE PEERS**'*>> 
A01   3 *<<*4By TREVOR WILLIAMS*>> 
A01   4    |^A *0MOVE to stop \0Mr. Gaitskell from nominating any more Labour 
A01   5 life Peers is to be made at a meeting of Labour {0M P}s tomorrow.
A01   6    |^\0Mr. Michael Foot has put down a resolution on the subject and 
A01   7 he is to be backed by \0Mr. Will Griffiths, {0M P} for Manchester

 


Conditions on the use of ICAME corpus material

The primary purposes of the International Computer Archive of Modern English (ICAME) are:

  1. collecting and distributing information on (i) English language material available for computer processing; and (ii) linguistic research completed or in progress on this material;
  2. compiling an archive of corpora to be located at the University of Bergen, from where copies of the material can be obtained at cost.

The following conditions govern the use of corpus material distributed through ICAME:

  1. No copies of corpora, or parts of corpora, are to be distributed under any circumstances without the written permission of ICAME.
  2. Print-outs of corpora, or parts thereof, are to be used for bona fide research of a non-profit nature. Holders of copies of corpora may not reproduce any texts, or parts of texts, for any purpose other than scholarly research without getting the written permission of the individual copyright holders, as listed in the manual or record sheet accompanying the corpus in question. (For material where there is no known copyright holder, the person(s) who originally prepared the material in computerized form will be regarded as the copyright holder(s).)
  3. Commercial publishers and other non-academic organizations wishing to make use of part or all of a corpus or a print-out thereof must obtain permission from all the individual copyright holders involved.
  4. The person(s) who originally prepared the material in computerized form must be acknowledged in every subsequent use of it.

Use of ICAME texts within an institution
Though ICAME texts cannot be used and distributed outside the institution making the order, they can be freely used within the institution (department, faculty, university) for the purposes of research and teaching. To prevent any use of the material for commercial and profit-making purposes, it is advisable to limit access to registered computer users within the institution. The way this is done may vary depending upon the institution making the order.