British National Corpus

All you want to know about British National Corpus

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. It was compiled as a general corpus (collection of texts) in the field of corpus linguistics. The corpus covers British English of the late twentieth century from a wide variety of genres with the intention that it be a representative sample of spoken and written British English of that time.

Of the two parts to the 10-million word spoken corpus, one is a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public and the other a context-governed part, containing transcriptions of recordings made at specific types of meeting and event. All the original recordings transcribed for inclusion in the BNC have been deposited at the National Sound Archives of the British Library.

The corpus is marked up following the recommendations of the Text Encoding Initiative and includes full linguistic annotation and contextual information The most recent edition, from March 2007, is distributed in XML format along with the XAIRA software. It is freely available under a licence and is very widely distributed.

See also

External links



No comments have been added.



Your name:

City:

Country:

Your comments:

Security check *
(Please enter the number into adjoining box)

 
up to content ยป
  • Ads

           
eXTReMe Tracker