Yak Shaving and Typography

Sat 02 February 2019 by psu

There are few things you know for sure in this world. But here is one that has been true for the last 30 years or so, and probably will be true for as many years as I can see into the future.

Fact: every student all over the world who decides to do an advanced degree in math, science, engineering and sometimes even related fields in the humanities will all have one thing in common no matter their background, their interests, or their future lives. What they will have in common is that they'll all have to learn \(\rm\TeX\) to write their papers and books, and they'll have one of the greatest yak shaving exercises in history to thank for it.

If you've used \(\rm\TeX\) or done any computer programming in the last 40 years, you probably know what I'm going to say. But for the younger people in the audience (and because I like the story) here is the story.

Part 1: Prehistory

In 1968, at the tender age of thirty, Donald Knuth published the first volume of a book that was supposed to be about compilers. Volume 1 of The Art of Computer Programming instead formed the basis for what would become the first systematic and theoretical investigation of techniques for designing and analyzing algorithms\(^{1}\) in the future field of academic computer science.

Over the next ten years he would publish three volumes (1. Fundamental Algorithms, 2. Seminumerical Algorithms, and 3. Sorting and Searching) and also updated editions of volumes 1 and 3.

It's hard to overstate the influence of these books, especially at the time. Over the years, at least in my opinion, they have become more of an extremely well written semi-historical survey than something a working programmer, or even working academic, would look at on a day to day basis. But still, it is an incredible work.

Then in 1976 he started to update volume 2. And here is where our story really starts.

Part 2: The Shaving Begins

The original printings of TAOCP were done on semi-automatic mechanical typesetting devices built by a company called Monotype beginning in the late 19th century. Someone would actually type all the text on a keyboard where it was recorded on paper tape. Then a second machine would read the tape and cast metal type in real time from hot metal\(^{2}\) into galleys, one for each page. Then you make paper prints from the galleys and offset plates from the master pages and then you use the plates to print the books. For complicated technical books the typographer would also have to do a lot of hand-setting of type for the mathematical formulas and programs and whatnot. It was exactly as tedious and time consuming as it sounds.

In the late 70s the mechanical machines were starting to be phased out and replaced with machines that made the offset plates directly using a photographic system to place the text on the plates. In 1976 Knuth made enough changes to his second volume that all the text had to be reset ... but the old mechanical Monotype machines at his publisher had been retired and the new photo-based machines did not generate text that looked the same. Knuth was left with the following choices:

  1. Have the new edition of Volume 2 look different and weird.
  2. Have the new edition of Volume 2 printed in England, where the mechanical machines still existed, at much higher cost.
  3. Give up.

He was in fact thinking about giving up when he saw some output, apparently generated by a computer program, from a new high resolution laser-based digital typesetting machine. "Computer program? I can write computer programs!", he thought to himself. And the idea for \(\rm\TeX\) was born.

Over the next few years Knuth (and his graduate students) built what would become the first versions of two programs: \(\rm\TeX\) for typesetting technical papers and books, and METAFONT (I can't do the real logo in HTML, apparently, oh well) for creating fonts. The goal was to make digital versions of the type used in the original books (a font called Monotype Modern 8\(^{3}\)) and also a new markup language to allow the text of the books to be in digital form.

The first large scale test of this new system was the 1981 second edition of Volume 2. While Knuth was satisfied with proofs generated by his new system, the printed and bound versions of the new Volume 2 did not make him happy. I can say from personal experience that the second edition of Volume 2 does not really look like the originals. The fonts, especially the sans serif, are just not quite there yet.

That said, even if the fonts did not look right the typesetting was a triumph. The first final version of the \(\rm\TeX\) as we know it was finished in 1982 and released out to the world. While Knuth worked to improve his amateur font designs (with the help of many people who knew a lot more than he did) \(\rm\TeX\) (mostly in the form of \(\rm\LaTeX\)) became the standard way to write and publish technical text, especially in academic circles.

Part 3: Knuth and Me

I first looked at the Knuth books in high school just as I was getting interested in computer programming. What confused me at the time was why all that math was in there. My feeling was that math did not have much to do with computer programming. I was young, ignorant, and very much overestimated how much I understood about computer programming at the time. If I had lived in 2018 I would have posted a lot of stupid shit on twitter about this. Happily I didn't, and by the time I graduated from college and went to study computer science in graduate school I had a better understanding of what Knuth was going on about and I had my own copy of the books.

I first came across \(\rm\TeX\) in college, but didn't really do battle with it in earnest until graduate school. Like most graduate students I ended up fighting with \(\rm\TeX\) to get my thesis done. When I got out of the academic game I did my best to forget about Knuth and his software. \(\rm\TeX\) is a precisely designed and functional tool for what it does, but from a software engineering perspective it's kind of a disaster. The fact that the source code is mashed up inside a giant \(\rm\TeX\) document so that you can print it as a book does not help\(^{4}\). Through the 80s and 90s publishing of course evolved from phototypesetting to fully digital printing workflows to publishing based on no printing at all. There are now tools by the hundreds to help you do this work, so you would have thought that \(\rm\TeX\) would have died the slow death. But this has not been the case.

But I do not print things or write books, so for twenty five happy years I did not think about \(\rm\TeX\). Knuth, meanwhile, finished METAFONT and his fonts got back to writing his books in 1986, after shaving the yak for almost ten years. Just kidding the new editions of his books did not appear until the late 90s. I lost track of why. He is now about a third of the way through the material for Volume 4, but he has a long way to go before even getting to the section on NP ... so don't hold your breath. The shaving will continue for the foreseeable future.

Part 4: Modern \(\rm\TeX\)

These days \(\rm\TeX\) has become almost a standard wire format, if you will, for the expression of mathematical formulas in printed or more recently in electronically printed form. You can even put \(\rm\TeX\) commands into web pages now. But the \(\rm\TeX\) engine and user model remain mostly the same: you edit a plain text file with markup in it and \(\rm\TeX\) compiles that and spits out something you can preview on your screen, or print to your printer. These days you can edit your text on the Internet if you want, and you can format it on your iPhone, if you want, but the basic system remains the same. It turns out there isn't a huge audience for anything else.

I came back to \(\rm\TeX\) for the dumbest of all possible reasons. I like to read papers about physics (and math, and astronomy, and sometimes computers). While these papers are now easy to find and store electronically (you go here. Give them money, they do great work), they are sometimes not the most readable artifacts. My complaints include:

  1. Two column layouts are the standard for journals, but are a pain to read on screens.
  2. Most older papers do not have hyperlinks for various things like references. Most newer papers do have hyperlinks but they use the default formatting for them, which is hideous.
  3. I've never really liked Knuth's typefaces (more on this later). I did my dissertation in Times. More modern \(\rm\TeX\) lets you use my favorite font of all time, Palatino, for both text and math. It is glorious.

So if I find a paper, and if the source code for the paper is available (and it usually is at the arXiv, which is amazing), then I will noodle with it for 10 minutes to try and make it better. Or give up. What is remarkable is that about 90% of the time you can take a ten or twenty year old paper that was originally written in some journal format and redo it without much trouble. The fact that \(\rm\TeX\) and its surrounding tools (esp. the \(\rm\LaTeX\) macros) have been stable enough over such a long period of time to make this possible turns out to be one of the most important parts of the Knuth legacy, IMHO. For better or worse, he froze the software and its formats and they have become a semi-accidentally archival in a way that no one would have considered possible.

Part 5: Knuth and Me, Revisited

Inevitably using \(\rm\TeX\) even casually brought my brain back around to checking in on Knuth. You can't really avoid it. In the twenty five years since I looked last he had apparently:

  1. Finished off his software, mostly.
  2. He published 5 books about \(\rm\TeX\) and the fonts.
  3. He co-wrote a textbook on math for Computer Science..
  4. He re-published the five books about \(\rm\TeX\) and the fonts in 2000.
  5. He finally redid Volumes 1 to 3, and started Volume 4.
  6. He published a large series of retrospective collections, mostly about his academic work, and one is just about the \(\rm\TeX\) project and its history. That's where most of my story came from.
  7. He rebuilt the fake computer and assembly language used in his books to study "real" code.
  8. Who knows what else.

In a fit of nostalgia over Christmas (and because my boss told me to do "something stupid") I bought a new set of the Knuth books to look at again, along with a new copy of the \(\TeX book\). I also got the Digital Typography volume, which has a lot of neat old historical material in it.

The Art of Computer Programming is as good as ever. In addition, even though I've never been a fan of the Computer Modern typefaces for whatever reason in those specific books they look good. They look especially good printed, but are also excellent in PDF form. Which is a rare thing. There is something about the entire page design of the books that holds them together and makes you forget about how the type still looks a bit off compared to what is in the classic books that Knuth so admired.

Digital Typography also looks fine. It has that more generic "I was formatted with \(\rm\TeX\) look to it. But it's fine.

In contrast the new editions of the \(\TeX book\) and its companions look awful. They are clearly not printed by the same machines and on the same paper used for TAOCP. The result looks like something I could pull off my inkjet fax machine if I wanted to. A disappointment.

Having come full circle, my only regret now is that I lost the source code to my PhD dissertation twenty years ago in a bad laptop move. It would have been cool to see how it looks in Palatino, with modern outline fonts.

Part 6: \(\rm\TeX\) and Me, Redux

The rest of the story is mostly about fonts. In 1990 when I did my PhD thesis the main set of typefaces available for \(\rm\TeX\) that would support both text and mathematics was Computer Modern. There were a few sets of fonts that you could buy for money, but no graduate student has enough money to do that. it turns out that creating a set of typefaces that work well for both text and math in \(\rm \TeX\) is a lot of hard detail work. Very few font sets even today get everything right. It's long been my opinion that Computer Modern, especially before the later improvements, has never had the nicest set of letters to look at. But you have to admire all of the meticulous work that Knuth (and the folks at the AMS) did do get its support for \(\rm\TeX\)'s mathematical alphabets and layout right.

I compromised by using Postscript fonts for text but Computer Modern for math, which a lot of people have done, but in fact is not a great choice. I still see this done in some books published in two-thousand-and-eighteen and I just frown inside.

These days there are about a dozen different free sets of typefaces that also have good support for the wide range of math characters that are available Computer Modern, but are based on text faces that you might like better. As I said twice above my favorite is an old package that is based on Palatino, but you can also get your Times, Garamond, Baskerville, Charter and many more. Check them out here: http://www.tug.dk/FontCatalogue/.

There are also two well known and widely used commercial font sets: Lucida and the various versions of Minion. Both of these also look great, and I got interested in Lucida because its text face is perhaps a bit influenced by Palatino, since the designers were students of Zapf. It turns out that the Lucida people also have had a long relationship with Knuth so you can buy a package of their fonts specifically for \(\rm\TeX\) from the \(\rm\TeX\) Users Group and even get half off if you are a member. I've never been much for Users Groups but this seemed like a win so I grabbed a discounted first year membership and the fonts.

I like the Lucida fonts a lot. Maybe I'll write a book to use them (that would be dumb). But I might still like Palatino more. Fiddling with fonts is a hobby that I could spend altogether too much time indulging in. Luckily there is a guy on the Internet who has done it all for me. Look at the print samples here and you never have to fiddle with fonts yourself again.

Surprisingly, I also like the back issues of the TUG newsletter. There are old stories about running \(\rm\TeX\) on old mainframes that took 10 seconds to process each page, the pain and suffering some people put themselves through to try and modernize the code in the \(\rm\TeX\) engine itself (mostly to no avail), and old letters to Brian Reid (an inside joke). You can get these stories without paying the membership fee (back issues are free to download), but tossing a bit of monetary support towards the most niche of niche publications is a good, if useless, way to feel good about yourself. Since its inception more than forty years ago \(\rm\TeX\) has also become something that history will study, and these TUGboat publications will form a large art of the primary source material for those studies.

To sum up, I have realized that it's not really fair to call \(\rm\TeX\) a Yak Shave. While it certainly started out as a classic yak shaving project: a small side project in support of a much larger magnum opus, it ended up being arguably the more important artifact. Knuth's books will always have an audience that is primarily limited to people studying programming and computer science. \(\rm\TeX\), on the other hand, has had a much wider impact. This quote from Charles Bigelow, one of the people who designed the Lucida fonts, supports this anecdotal conclusion:

A personal anecdote to support that claim: My neighbor is a retired mathematician, Norman Alling. He wrote a book on real elliptic curves and taught himself \(\rm\TeX\) in his 50s, so he could compose his book and papers himself, and he still uses \(\rm\TeX\) now, in his 80s. He says \(\rm\TeX\) liberated math journals and authors from dependence on commercial math typesetting, which was slow, expensive, and fraught with typographical errors needing proofreading and correction. When I told him that some people suggested that Knuth could have better spent his time finishing the Art of Computer Programming books instead of spending a decade developing \(\rm\TeX\), he replied: Oh no, \(\rm\TeX\) liberated so many mathematicians and scientists from the bottleneck of typesetting that it was a great boon to all of math and science, more important for the world-wide science and technical professions than Knuth’s unpublished books on computing, however excellent they might be\({}^{5}\).

Further Reading

You can read more about various aspects of the history of \(\rm\TeX\) from the early days until now in the PDFs linked below:

  1. 25 Years of \(\rm\TeX\) and METAFONT

  2. The \(\rm\TeX\) Family in 2009

  3. \(\rm E\)-\(\rm\TeX\): Guidelines for Future \(\rm\TeX\) Extensions -- revisited

  4. \(\rm\TeX\): A Branch in Desktop Publishing Evolution Part 1

  5. \(\rm\TeX\): A Branch in Desktop Publishing Evolution Part 2


  1. By algorithms here I do not mean the current usage where what we mean by an "algorithm" is some intern-built glorified gradient following engine driven by terabytes of data collected by automatically surveilling the actions of millions of people while they innocently use their computers on a world-wide information network. No, I mean specific programs written to do specific tasks, which are usually specified in some semi-formal mathematical language. You can then ask the question "how fast will these programs run?" ... and this is question that the Knuth books explore in incredible detail and with incredible precision.

  2. Here are two cool youtube videos about how the Monotype machine works.

    Keyboard: https://www.youtube.com/watch?v=LcphfMlOzk4

    Casting machine: https://www.youtube.com/watch?v=M9DV95IEKGU

  3. Monotype Modern 8A was a set of typefaces used by Knuth's publisher, Addison-Wesley, for a lot of their technical books through the 50s and 60s. You can find examples at archive.org if you are curious.

  4. Knuth's so-called literate programming system is one of his dumber ideas, but also one that he seems to love the most. But that's a subject for a different rant.

  5. The whole interview is here.

Category: Computers