Here is a list of various language-related datasets that I have either created myself or compiled from other sources, and which I am making available to the world under copyleft or open-source licenses.
Machine-readable lists of lemma-token pairs in 23 languages.
An Irish-English dictionary for learners of Irish with over 5,000 entries.
About 4,500 sentences in Irish, tokenized, manually lemmatized and translated into English.
About 6,500 Irish lemmas (= "words") ordered by corpus frequency, with noise removed.
Princeton Wordnet converted into a relational database format.