This page enables download of source files for the entire ONCOJ corpus. At present all the data is available in a folder containing 26 text files in a Penn Historical-style bracketed trees format, which allows use of tools such as TGrep, TGrep2, Tregex, CorpusSearch2 and TSurgeon.

Later updates will include a folder with one file per text in Alpino format, and a folder with one file per text in TEI convertible format (that is, a format that can be converted to a TEI compatible file with some simple processing).

Download source files

  1. Bracketed trees (text):
  2. Alpino format (xml)
  3. TEI convertible format (xml)