What’s the longest word that’s a substring of an unrelated other word?
Recently I encountered the word anorthosite which is a mineral that’s not very common on Earth, but common on the Moon. I noted that it has a substring that’s another word: anorthosite. I’ve also been aware for some time of a six letter example: chemotherapy. So anyone know of any longer ones?
Note the requirement that the two words need to be unrelated. So the root word of a longer one with suffixes or prefixes is right out. Also out are the components of compound words.
You only care about the length of the substring word and not the enclosing word, right? I’m curious why you chose chemotherapy as an example when “smother” is a simpler example of a word containing “mother”. Do you have a criterion that makes chemotherapy preferable to smother?
I’d spotted the chemotherapy example a while back and thought it might make a good cryptic crossword entry, were I ever to decide to compose one (very unlikely). Much like the famous city in Czechoslovakia. Anyway, because of that, it was on my mind.
Very good. And I’m certain there are more and longer ones where the first letter is removed, a process called a beheadment among word play enthusiasts. But I’d rather not turn this into finding the longest beheadment†, so let me add a couple more requirements:
There must be at least one letter removed from both the beginning and ending of the long word to get the shorter word. Furthermore, if only a single letter is removed from the beginning, the letters removed from the end cannot be a suffix or inflection.
†Nothing wrong with beheadments. It’s just a different form of word play.
Depending on exactly how you define two words to be “related”, some possibilities are
presentational in representational
vertibility in incontrovertibility
termination in determination
solvableness in irresolvableness
lightfulness in delightfulness
sumptuously in presumptuously
nationalize in denominationalize
autological in tautological
thematical in mathematical
terminable in indeterminable
… Argh, just saw your new requirements. Ok, starting over.
I suspect you could find a lot more of these in mineral names - for instance, Akrochordite, Ashoverite, Bilibinskite, Brockite, Canavesite, Halloysite, Jamesonite, Kidwellite, Rambergite, Schröckingerite etc. I’m sure there are lots more words buried in there.
Since this is FQ and not TG, I’ll note that the OP’s question seems like something that could be done programatically, as long as one had a sufficient English word list.
And, yes, my suggestion is that, if you want to make this into a game instead of just a Factual Question, it might work better in Thread Games.
True, it could be done by program. But it’s not a simple filter program. You’d have to read through the list hundreds of thousands, perhaps millions of times. Maybe I’m underestimating the capabilities of modern computers, but that will take a long time.
I’ve thought about how I would tackle it programatically and there’s ways to make it more efficient by making two lists, both of which are subsets of the usual difctionary lists. But it still seemed like a very long-running program.
I’ll let the moderators move it if they see fit. I wasn’t sure what forum it should go under when I posted it.
It doesn’t seem like it should take that long: take word a, iterate through the list and see if it’s in any other words. Take word B, do the same, and so on. Output all to a new list, sort by length. And you can optimize if length word(a) is bigger than word(b), skip that check. You can also only check words greater than n length, since we’re not interested in words with “a” and “on” and “she,” etc., in them. I’m sure there are other optimizations possible. I can’t see this taking long at all, even with a naive implementation.
OK, that’s better than the algorithm I had in mind, but probably only in its simplicity. It’s O(n^2) where n is the length of the list. Which I think is the same as my algorithm.† Probably the best optimization would be to make a temporary list with only words of whatever length is desired or longer, since that reduces n.
† I had the idea of reading a word, generate its substrings of the minimum length or longer and check each to see if it’s in the list. There’s all kinds of optimizations that can be done with this algorithm, including, as I indicated above, generating two temporary subset lists.
I’m not even sure that all that much optimization is needed. We’ve already found 7-letter examples, so all we need check is words of 8 letters or more for the inner words, and 10 letters or more for the outer words. It seems to me that that would pare down the size of our word list enough that O(n^2) wouldn’t be so bad.