Why in Microsoft Word are the words not automatically hyphenated but in the vast majority of books words are hyphenated? What is the rule on this?
I dunno the official reason, but my take is that hyphenation is unnecessary in the digital era when you’re not worried about saving pages and justified formatting can be automatically implemented through subtle use of spacing.
That said, if you miss your hyphens, just turn on automatic hyphenation.
Contrary to what some people think, justified type should allow hyphenation, to avoid excessive wordspacing or letterspacing. If it’s not the default, turn it on.
Is the OP talking about compound words or line breaks?
Books use hyphens in justified type for historic reasons. Typesetters really did set type, and they set it by hand, placing individual letters into lines called slugs. To even off the lines, the typesetters broke words with hyphens. To make the spacing more even between words they manually inserted tiny spacers, doing so by eye to make an aesthetically pleasing page.
Word is not set up to do book-style justification. It’s algorithms are designed to insert spacing electronically between words. That’s all that’s necessary for a manuscript page and most casual newsletters or other projects. For a nicer product, a manual once-over must be done to insert hyphens into words.
The algorithms to do this automatically are still not very good, although I haven’t tried the one in Word 2007. Newspapers, always subject to extreme time pressure, use them with the result that words get broken up at spots that bear no correspondence to no-rmal s-yllabification.
Line breaks.
Actually, the algorithms are very good, in particular Knuth-Plass, as used in TeX and some Adobe products. It can even be shown that they produce an optimal output (and quite efficient: quadratic running time). However, Microsoft, for whatever reason, chose to implement a greedy line breaking/spacing algorithm, which minimizes the number of lines used, instead of Knuth-Plass, which minimizes the raggedness of any line.
ETA: after re-reading your post, it appears you may have been talking specifically about the algorithms in Word, as opposed to them in general.
There’s no excuse for Word to do a crappy job of hyphenation. I started typesetting in 1972, and though the auto-hyphenation wasn’t perfect, it was better than today’s Word. By the time desktop publishing came to be, the software had very few hyphenation problems.
How do I turn on hyphenation in Word 2000?
I’m curious what you mean by “optimal output”. Does it really have a dictionary indicating where hyphens are allowed in every word, based on proper syllabizing each word, and using that to optimize the line of text?
Sure there’s an excuse. Word was written by evil monkeys.
No. The algorithm minimizes the raggedness of a paragraph by minimizing the square of the space left at the end of each line. TeX makes a guess at where hyphenation should occur using language specific patterns, and the amount of hyphenation can be controlled by adjusting the hyphenation penalty (TeX defaults to being very ready to hyphenate). If the guess is incorrect, you can tell the typesetting engine manually how hyphenisation of a word should be carried out using a special command.
I was referring to newspaper hyphenation, which is mostly good but sometimes produces hilariously bad results. I don’t know if there is a general default tool for newspapers. It’s very possible that some of them have proprietary software.
To be fair, newspaper columns have lines that are much shorter than books and are prone to long words or names causing problems. And as said, the time pressures of newspapers are more extreme than any other word business.
Evil monkeys would have a better plan.
They do have a better plan than is usually evident, but, being monkeys, they are incompetent to execute it.
The other option wouldn’t make less sense, though. They already have spelling dictionaries. Adding in hyphens wouldn’t take much. You could always just add invisible hyphens to the new words you use, if the program says you need them.
BTW, I think Microsoft’s leaving out that feature is to get you to buy the more expensive products if you are going for professional book-writing. Heck, even if Word could handle every feature you needed to write a professional book, I bet the actual professional printers wouldn’t use it. And they’re really the only ones who need to.
I’m a newspaper copy editor (admittedly not on the news section, so not such tight deadlines), and although I can confirm that the hyphenation algorithm in InDesign sucks, we do go through and manually break words in more suitable places whenever we can.
Odd. When I was investigating self-publication of a book, the companies that did it would take only Word documents (or Word Perfect, a couple of them). The fact that TeX did a superior job, not just in hyphenation, was irrelevant. Eventually, it was accepted by the Amer Math Soc.
TeX’s hyphenation algorithm is based on analyzing a dictionary of the lanuage in question and finding out where hyphens actually go. The algorithm looks mostly at three letter sequences and assigns a number based on whether this is a favorable or forbidden hyphenation point. Certain words, like “record”, will not be hyphenated because the noun is rec-ord, while the verb is re-cord. The user is free to add his own hyphenation dictionary (always use re-cord, for example) or add optional hyphens to particular words. It then takes every paragraph and calculates “all possible” line breaks, assigning a “badness” to each one, with hyphens assigned a particular penalty, and then chooses the least bad. There are also language-specific rules, such as in American English, there must be at least three letters before the hyphen and two after (in British English, it is three and three) and you have to add one to your file (unless you want to permit, say, “a-moral”). It all seemed to go fast even in the days of 6Mh AT computers, the first micro on which I actually ran TeX. Now, a 20 page manuscript compiles in under a second unless it has some heavy graphics.