This page enables download of source files for the entire ONCOJ corpus. At present all the data is available in a folder containing 26 text files in a Penn Historical-style bracketed trees format, which allows use of tools such as TGrep, TGrep2, Tregex, CorpusSearch2 and TSurgeon.
Later updates will include a folder with one file per text in Alpino format, and a folder with one file per text in TEI convertible format (that is, a format that can be converted to a TEI compatible file with some simple processing).
Download source files
- Bracketed trees (text): oncoj.zip
- Alpino format (xml)
- TEI convertible format (xml)