Oxford NINJAL Corpus of Old Japanese (ONCOJ)

Download

This page describes how to download of source files for the ONCOJ corpus. The data is in a Penn Historical-style bracketed trees format (see link below), which allows use of tools such as TGrep, TGrep2, Tregex, CorpusSearch2 and TSurgeon. The files are text files (with the extension .txt), and can be opened by any text editor. There are 4991 individual texts in the corpus. Each can be downloaded individually in its own .txt file. But the corpus has been compiled into 26 .txt files according to the sources of the texts, and it is possible to download these as well.

Until March 2022, all the data for periodic releases were available here through a link to a compressed folder containing 26 text files. As of March 2022, the data accessible through the search interface is directly derived from the source data as it is updated in real time. This data can be downloaded file by file directly through the search interface.