What's the process of indexing a book?

I’m reading ‘Citizens’ by Simon Schama. It’s a history of the French Revolution. It’s a pretty hefty book.

Anyways, I was looking through the index for references to Thomas Jefferson and it struck me how thoroughly indexed the book is.

How is it determined what will be indexed? Does the editor sort of make up a master list of what he feels are worthy of being indexed and then a team is given a list and they read the book and note each instance and page corresponding to what is on their list?

In the book i’m reading it’s obvious that for example: Louis XVI, trial and execution of: page 627 is gonna be included in the index but there’s also tons of obscure stuff. Things relating to his style of dress, eating habits, opinions on ______. And the same goes for everything else in the book. It must be a tremendous task. So whats the process?

I can’t search for it right now, but I’m pretty sure that in MPSIMS we had a discussion with a board poster who has worked as an indexer. I know I cited an issue with the first edition of Mark Bittman’s “How to Cook Everything” in the thread, so that may be a good searching assist.

Traditional old-school indexing used – drumroll – index cards. The indexer would go through the book and write a particular heading on the index card and the page where it was found. If there’s a second reference, he or she would write that page on the card. If there was a new topic, the indexer would take a new card and write down that subject. The cards would be kept in alphabetical order and you’d keep going through the book jotting down potential subjects and the page which it appeared. Then you’d take the cards and type them up as a list with page numbers. The indexing is done with the manuscript in final paste-up galley form, so the page numbers are correct.

Nowadays it’s done electronically. The indexer marks the text, enters the subject for the index, and the software compiles it.

Paging MsWhatsit!

You’ll learn a lot about the process by reading her thread “Ask the Professional Indexer.”

The last book of my own that I indexed was 400 pages long and it took me two weeks to index it – and that’s after marking things as I went. I’m not sure how long it would have taken a pro like MsWhatsit.

I use Adobe InDesign to format books, and it has an index function. You create entries and they are placed invisibly within the text. The software will create the final index, and put in the page number where that marker is.

The tricky part is to anticipate what your readers will want to look for and how they’ll do so. Say you wrote a section on Fonts, and you wanted to be sure readers could find it. You would, of course, make an entry for ‘Font.’ But a good indexer might make an entry for ‘Typeface’ as well, even if that word never appears in the section, because the reader might think of that word instead of ‘Font.’

I’ve talked to people new to this that think they can get it all done by just doing searches for particular words and adding entries wherever they come up. You can do that, but that’s not going to create a quality index. You shouldn’t just index words, you should index topics. The word might appear in places that don’t actually discuss it, and the topic might be written about without using the word. The software helps a great deal by keeping track of the page numbers for you and in formatting the final index automatically, but it cannot recognize the difference between a word and the topic it represents.

By the way, I am not really an indexer. I have created a lot of indexes, but only on books I’ve laid out and typeset. I don’t index other people’s publications, and I don’t think I’d be all that good at it. But I know how it’s done and have great respect for those that do it well. It requires really understanding the text while at the same time being able to conjure the mindset of someone who doesn’t.

Indexing is both an art and a science. As Saltire just said, you have to anticipate what people might be looking for. You can’t just put “Dope, Straight” in the index and think you’re done when many (most?) readers will be looking for “Straight Dope” (or “Cecil Adams” or “Adams, Cecil” or “Chicago Reader” or …).

Although I’ve mentioned it before, my favorite bit of indexing was in a computer book where I included:

Endless loop: See Loop, endless

Loop, endless: See Endless loop

It isn’t helpful to the reader to have too many page numbers piled up after a given index entry. An important aspect of indexing, one where the real art comes in, which requires a trained human brain, is when to break down a topic into smaller subtopics, to prevent too many page numbers under a given entry. For example, if a book mentioned Alice B. Toklas on a hundred different pages, you’d indent the subtopics pertaining to her under Toklas, Alice B.:
–Attitudes toward poetry,
–Attitudes toward politics,
Autobiography of, The,
–Belief in the afterlife,
–Dress size,
–Hashish brownies, see Recipes attributed to
–Life in Paris,
–Life in San Francisco,
–Musical studies,
–Recipes attributed to,
–Relationship with Gertrude Stein, see Stein, Gertrude, --Relationship with Alice B. Toklas,
–Shoe size, etc.

Roughly 10–15 pages under a given entry are plenty; with more than that it gets too hard on the reader to leave them undifferentiated, so break out a subtopic. Now this is not an easy rule to follow when you’re jotting on index cards and you’ve gotten two-thirds of the way through the book, and then discover you’ll need to go back and start breaking out subtopics from the start. It’s a good idea, time permitting, to read all the way through the book first and get familiar with how many pages are likely to appear on a given subtopic. But on a tight deadline this can get pretty challenging. That’s where the expert indexers earn their pay. It isn’t a job anyone can just jump right in and do well.

Another concept where the true art of indexing comes in is, as Saltire pointed out, to anticipate what readers will be looking for. You cannot automate this process; you have to give it careful thought. One method is to aim for specific concrete things, and as much as possible avoid vague, squishy topics. Specificity is your friend. How to determine just the right topics for a book is something dependent on the specific content of each book. There’s no one size fits all.

I self-indexed my last book, droping anchors into the text and then the software collected them, alphabetized them, and made an index. As suggested upthread, there were also indexas anchors that keyed on a word not actually in the text. Although I am not a professional indexer, I knew that no one else would undertake the task and I feel that some index was necessary and what I did was better than nothing.

It is a tremendous task. I’ve done it for one of my own books and never will again.

Software indexing has advantages and disadvantages. Obviously, it will find every instance of a word, something the eye can miss, but what happens if the word is both a proper noun and an object, Phoenix, say, or Lincoln the capital vs. Lincoln the President? Somebody has to check that the index means what it says. Even if there aren’t those double meanings, some books will index every mention and some will not and there are arguments for doing it either way. “Mentions of the comet appeared in papers in Buffalo, Cleveland, Pittsburgh, and Cincinnati.” Do you index all those cities or not?

The answers to these and the thousand other questions depend on the topic of the book and what you want readers to be able to extract. Context is everything, but it’s rare to find an index that properly evaluates context to shape its entries. I’m seeing a trend in modern indexes in which they are overwhelmingly proper nouns rather than concepts. That’s software at work, undoubtedly, and it’s fatal for researchers.

A good concept index requires a mind that understands the work and can parse sentences on every page. Very rare, but very valuable.

40 years ago ( before the computer age), I had a professor who said that the only person who could properly index a book was the author himself.
The example above (of the comet seen in 4 cities) shows the reason: the author knows what is important to the audience he is writing for.
Unlike a computer, or a professional librarian or indexer who only skimmed the text, the author knows whether he wants the reader concentrate on the fact that a comet was seen, or that a comet was seen in particular cities.

Hogwash. A good reader can divine the author’s intent in any book that’s halfway decently written. That’s what a professional indexer is expected to do, which is why “only the author can index a book” is baloney. If that were true, it would mean the author is the only one who can understand the book. Further, a good indexer sees it from the reader’s point of view, something authors aren’t always capable of. The indexer is the interface between author and reader. It’s a unique position demanding well-honed literacy skills.

Johanna and chappachula, you’re both right in a way. Here’s my perspective as an author:

An indexer knows indexing and understands how people use indexes. The professional indexer brings a perspective that I might not have, indexing things that wouldn’t have occurred to me because (to me, anyway) they’re obvious.

On the other hand, I have a deeper understanding of the subject material and the intended audience than the indexer. There are nuances of the subject matter that I will see that an indexer might miss. Where the indexer knows what someone new to the topic might look for in an index, I know what subject matter experts might look for – and those words can be quite different.

In other words, an outstanding index requires collaboration between the author and the indexer.

In my case, the publisher requires that I index the books myself. I still use index cards, Reality Chuck to the contrary notwithstanding. (I get to do this again in another month or so). Going through the text and marking down the pages with index cards really does give you the most direct way of marking the relevant pages, with appropriate subheadings, and prevents the sort of “let’s tag every time the keyword appears” blitzing that blindly using software can lead to.

I remember when my father was indexing his first important multi-volume book. In his case, and to this day, he winds up doing the indexing because he is the only one who knows all the source languages he works with.

Once the book was in press, naturally he kept all those index cards in his desk drawer to reuse. I would go up to the pile, pick one at random, and demand where it appeared and what it was.

My mother was the first director of electronic indexing and abstracting at a huge technology professional society. I learned about Booleans when I was about eight.

Do fictional accounts carry any weight in this forum? In Cat’s Cradle by Kurt Vonnegut, one of the characters is a professional book indexer, who warns that one should never index one’s own books.

The Freudian-sounding argument implied there is that a self-indexer will inevitably allow his indexing choices to be colored by his own personality or other baggage. In discussing a book by one of the other characters (who indexed his own), she notes that she could tell too much, then clammed up and refused to say more. Later her husband says that she could tell that the author is gay.

Hey, it’s just fiction. (And, being Cat’s Cradle, rather silly fiction.) I put in this post here mainly just for fun.

I imagine that most modern indexers would use a “hybrid” or “multipass” approach, though. Wouldn’t they? Run a software indexer to generate a list of all the words (perhaps with a filter that throws out words like “the”) – maybe more advanced software can parse proper nouns or phrases that the indexer enters as well. Then the indexer can organize and prune the list, merge entries, abbreviations, and synonyms that the software can’t handle on its own, and so on. All the advantages of having a professional indexer, with less human error causing one to miss words and such.

ETA: And I suspect that good software would have other advantages, like auto-updating page numbers if the font or page size changes, or you receive a minor edit that adds a paragraph here and there.

Software unquestionably has advantages. I use software assistance whenever possible.

I also read lots and lots and lots of nonfiction. My impression - and I can’t use any word stronger than that - is that recent indexes are shorter and use fewer theme entries. I think there’s a connection between that and poorly used indexing software.

Some graduate student is probably making a dissertation out of this.

This is obviously a good occasion to express my disgust with indexes to medical/scientific texts that seem more interested in systematic consistency than getting you to the subject you’re interested in.

I refer specifically to looking up a term (let’s say, sinonasal rhinosporidiosis) and finding it alphabetically listed with the admonishment “See rhinosporidiosis, sinonasal”. Would it really kill you pissants to just put down the page number?