I’m working on a little personal coding project, and I need a list of dictionary words. I’m planning to publish the end result as an open source package, and the code will include (a subset of) said list - so I’d like to find a list of words that is itself published under an open source license.
I’ve found various “dictionary files” online, but none that come with an open source license. Also, many of them contain what I consider to be nonsense words, abbreviations, proper nouns, things like “zzzzz” etc. I’m looking for a list of common words that would reasonably appear in a dictionary.
Wiktionary offers various word lists for different purposes, and its licensing policy is a dual-license combining Creative Commons and Gnu Free Documentation. Essentially, you can use Wiktionary material but have to acknowledge so, and include a copy of the license texts.
I’ve had a look at this. The license seems to assume that you are going to republish the word lists as documents, rather than incorporate them into software. I’ll have another look / see if I can contact someone at Wicktionary to check my use case is acceptable under the license. I’d have to find a way to scrape the words themselves from the webpage, but that won’t be too hard, assuming it’s acceptable. (I couldn’t find an option to download them in a text-based format, which is what I need).
This sounds promising. The URL above includes a link to the dictionaries page and then to the English dictionary, which is a compressed folder. I confess that upon downloading and uncompressing, I was hoping to find a file (text, JSON, XML etc) that I could easily consume, but I think I need to build it first. I’ve never played with make before, and I’m out of my depth, but I’ll have a play and see what happens. Worst case, I need a new computer anyway
Most properly setup package that you have to build, you just need to be in the right directory and type make
This assumes you have the proper compilers already. If you’re on a *nix system, you likely do.
There may be a README file that tells you any specifics you need to know.
A simple list of words is not copyrightable, as it doesn’t meet the minimum bar for creative input. A license is irrelevant.
If the list were hand-selected in some way, or presented as structured data, or it contained invented words, the situation might be different. But a comprehensive list of normal English words words in alphabetical or random order doesn’t get copyright protection.
The official Scrabble word list is something of an edge case. Hasbro claims copyright, but the claim is fairly dubious (I don’t think it’s ever been tested in court, but Hasbro has sent cease-and-desist letters to alleged offenders). Still, if you stay away from that and similar lists, you’re perfectly safe.
To grab it, click the green Code button in the top right, then click Download ZIP
It’s Public Domain and, as we all know, Public Domain means never having to say you’re sorry: Its owners have officially given up all ownership in it, meaning anyone, anywhere can use it for any purpose, now and forever until the end of time, amen.