I think the title says it all, but here is my problem.
I have a text file that is full of words that are made up of Unicode (utf-8) based characters. I need to create a list of the words and order them by frequency of occurrence.
I have no knowledge of programing languages (Python, C++, etc), but I am willing to learn/do anything to get this (seemingly) simple task done. Can anyone help me?
Also I would like a program that can do this for a file the exists on my computer. A web based program probably cant handle a large set of data; and splitting the text up by cutting, pasting and processing it in blocks defeats the purpose of ordering the words by frequency.
I also need a program that works with unicode (non-latin based scripts).
A day late and a dollar short, but for a way to do this without using someone else’s website:
Take your text and past it into Excel, all into one cell. Format it as a single paragraph
Use “Text to Columns” to convert the paragraph to a row with one word in each cell.
Copy the row, and use “Paste Special” “Tranpose” to convert your row to a column, one word in each cell.
Sort the column.
Use “Subtotals” on the column, and ask it to “Count” the number of times each word appears.
I actually couldn’t use the program because it was to slow to process the amount of data I had (after 5 hours it was nowhere near to 10% done). However as luck would have it Ifound this and used the MS word macro to do the job.
For anyone else who ever faces this task (and who happens to have a Mac), I recommend this standalone program, which will generate word lists from text files, URLs, or anything you type or paste in manually:
Sorry, I didn’t think a link was necessary. Doing the Google search without putting quotes around the phrase (like this) yields about 700,000 results, and the first page is almost all programs or websites that will do the job.