In part 1 and part 2 we tried to set up enough of the mathematical formalism of quantum mechanics to be able to talk about the measurement paradox in a reasonably precise way. If you were smart and skipped ahead to here you can now get the whole answer without reading through all that other tedious nonsense.

For reference, here are the rules that we currently know about quantum mechanics:

States are vectors in a Hilbert space, usually over \mathbb C.

Observables are self-adjoint linear operators on that space.

The possible values of observables are the eigenvalues of the corresponding operator, and the eigenvectors are the states that achieve those values. In addition, for the operators that represent observables, we can find eigenvectors that form an orthonormal basis of the underlying state space.

There is a special observable for the energy of the system whose operator we call H, for the Hamiltonian. Time evolution of states is then given by the Schrödinger equation.

Now we’ll finally talk about measurement.

As before, I am the furthest thing from an expert on this subject. I’m just trying to summarize some interesting stuff and hoping that I’m not too wrong. I’ll provide a list of more better sources at the end.

In quantum mechanics measurements are the connection between eigen-things and observables. We interpret the eigenvalues of the operator representing an observable as the values that we can see from that observable in experiments. In addition, if the system is in a state which is an eigenvector of the operator, then the value you get from the observable will always be the corresponding eigenvalue.

The simplest model of measurement in quantum systems is to say that a measurement is represented by acting with a single operator representing the observable on a single vector representing the state of the system. In this simple model we are doing “idealized” measurements (simple operators) on “pure” states (simple vectors). There are generalizations of both of these ideas that you can pursue if you are interested. See the further reading.

If we perform a measurement on a system that is in a state represented by an eigenvector of the operator, we always get absolutely determined and well defined answers.

For example let’s say we are in a system where the Hilbert space \cal H is two dimensional, so we can represent it as \mathbb C^2 and with scalars from \mathbb C. So, any basis that we define for the space needs only two vectors: | 0 \rangle = \begin{pmatrix}1\\ 0\end{pmatrix} and | 1 \rangle = \begin{pmatrix}0\\ 1\end{pmatrix}

If we have some operator S such that | 0 \rangle and | 1 \rangle are its eigenvectors with eigenvalues \lambda_0 and \lambda_1. Then we know that if we measure either | 0 \rangle or | 1 \rangle with S we’ll get some number with probability 100%:

That is:

S | 0 \rangle = \lambda_0 | 0 \rangle

and

S | 1 \rangle = \lambda_1 | 1 \rangle

But, quantum states come in Hilbert spaces, which are linear. This means that we also have to figure out what to do if our state vector is any linear combination of the eigenvectors. So what if we had a state like this:

c_0 | 0 \rangle + c_1 | 1 \rangle

where c_0 and c_1 are arbitrary constants? In this case the result of doing a measurement will then either be the eigenvalue \lambda_0 with some probability p_0 or \lambda_1 with some other probability p_1.

The Born rule then states that the probability of getting \lambda_0 is

p_0 = { |c_0|^2 \over |c_0|^2 + |c_1|^2 }

and the probability of getting \lambda_1 is

p_1 = { |c_1|^2 \over |c_0|^2 + |c_1|^2 } .

We have seen a version of this rule before, in part 1, but this time I normalized the probabilities like a good boy (so that they add up to 1).

One last puzzle that should be bothering you is the question of whether we can represent *any* state as a linear combination of the eigenvectors of the operator. It turns out we can, because we specified that observables are self-adjoint, so we can invoke the spectral theorem from part 2 which says that given an arbitrary state \psi \in \cal H we can always write the state as a linear combination of the eigenvectors.

In summary: given an arbitrary state vector \psi \in \cal H and an observable represented by an operator S you can calculate the behavior of S on \psi by first expressing \psi as a linear combination of eigenvectors of S (because you can find eigenvectors that form a basis) and then applying the Born rule.

So in our example above, where the operator S has eigenvectors | 0 \rangle and | 1 \rangle, we can first write \psi like this:

\psi = c_0 | 0 \rangle + c_1 | 1 \rangle

And then we use the Born rule to compute the measurement probabilities.

The most famous two-state system in the quantum mechanics literature is the so-called “spin 1\over 2” system. The behavior of these systems was first explored in the Stern-Gerlach experiment. In this experiment you shoot electrons (really atoms with a single free electron) through a non-uniform magnetic field, and see where they end up on a screen on the other side. You would expect them to end up in some continuous distribution of possible points, but it turns out they end up in only one of two points, which we will call “up” and “down”. We’re just going to take this result for granted rather than trying to explain it right now.

We can imagine spin as being like a little arrow over the top of the electron pointing either “up” or “down” along a certain spatial axis (e.g. x, y, or z). The Stern-Gerlach device determines the state of this “arrow” by measuring the behavior of the electron in a magnetic field. So it’s sort of like a magnet … but not really.

The state space for this system is just \mathbb C^2. Each one of the spin states is some linear combination of | 0 \rangle and | 1\rangle above.

It also turns out that there are four convenient operators that we can use as observables: the identity, and a spin operator for each spatial axis which we will call S_x, S_y and S_z. For all the details of where these come from, you can read about the Pauli matrices.

The Pauli matrices are called \sigma_1, \sigma_2 and \sigma_3. And the spin operators S_x, S_y, and S_z are defined as

S_x = {\sigma_1 \over 2}, \quad S_y = {\sigma_2 \over 2}, \quad S_z = {\sigma_3 \over 2} .

I can’t decide if it’s a deep mathematical fact or just a strange coincidence of nature that \mathbb C^2 should have exactly three operators for spin measurements, one in each direction that we need. It seems a bit spooky that it worked out that way.

Note: in all of the computations below I’m leaving out factors of \hbar. This is a standard trick in physics texts … you can use units where \hbar = 1 and then put it back later if you want.

We measure spin using a box with a magnetic field in it. So, imagine that we have some box with one hole on the left, and two holes on the right. We send an electron in the left hole and it comes out the top hole if the spin is up, and the bottom hole if the spin is down. We have three kinds of boxes that each measure the spin in a different direction (again: x, y or z).

So the S_z box looks like this:

We start with a beam of particles where each particle is in a completely random state. Electrons (say) go in the left hole and the spin up stuff is directed out the top right hole and the spin down stuff comes out the bottom right hole. We can when consider what happens if we take a bunch of devices like this, chain them together, and take sequential measurements.

First suppose we put another S_z box right after the first one so that all of the particles that enter the second box come out of the {\small +} hole of the first box. What will happen here is that 100% of this beam will come out the {\small +} hole of the second box. This seems very reasonable, since they were all z-spin up particles.

This behavior might make you think that z-spin is a property that we can attach to the electron, perhaps for all time, like classical properties, and that this box acts like a filter that just reads off the property and sends the particles the right way. Keep this thought in your brain.

Next, we can see that the relationship of S_z to S_x is also straightforward. A particle that has a definite z-spin still has an undefined x-spin:

So here when we put a S_x box right after the S_z box and send all the z-spin up particles through we will get x-spin up half the time and x-spin down half the time. If you study the material on the Pauli matrices above this will make sense because it turns out that the eigenvectors of S_z can be written as a superposition of the S_x eigenvectors with coefficients that make these probabilities 1/2 (and vice versa). In particular:

|z_+\rangle = | 0 \rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \, {\rm and}\,\, |z_-\rangle = | 1 \rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}

|x_+\rangle = {1 \over \sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \, {\rm and}\,\, |x_-\rangle = {1 \over \sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}

From this we can figure out that:

|x_+\rangle = {1 \over \sqrt{2}} (|z_+\rangle + |z_-\rangle)

and

|z_+\rangle = {1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)

The Born rule then tells us that measuring the x-spin of a z-spin up particle will get you x-spin up half the time and x-down half the time. Similarly, measuring the z-spin of an x-spin up particle will get you z-spin up half the time and z-spin down half the time.

Relationships like this also happen to be true for the all of eigenvectors of all the spin operators. Some of the references at the end go into these details.

Finally, we can push on this idea a bit more by adding yet another S_z box on the end of the experiment above. When we do this we get a result that is somewhat surprising.

We might think that all of the particles coming out of the S_x box should be z-spin “up” since we had filtered for those using the first box. Sadly, this is not the case. Measuring the x-spin seems to wipe away whatever z-spin we saw before. This is surprising. Somehow going through the S_x box has made the z-spin undefined again, and we go back to 50/50 instead of 100% spin up.

So now our problem is this: what is going on in the last spin experiment?

We can interpret the first two experiments as behaving like sequential filters. The first z-spin box filters out just the particles with spin-up, and then we feed those to the second box (either z or x) and get the expected answer.

In order to make sense of the third experiment it seems like we need posit that measurements in quantum mechanics have side effects on the systems that they measure. How can we account for the fact that the z-up property that the particles have before measuring the x-spin seems to disappear after we measure the x-spin?

The standard answer to this question goes something like this:

- We start with particles with some arbitrary spin state.
- But, when the particles that come out of the {\small +} hole of the first z-spin box have a definite spin of |z_+\rangle, or z-up.
- Thus if the second box measures z-spin again, as in the second experiment, all the particles are spin up, and they all come out of the z-up hole.
- But, if the second box is an x-spin box, as in the third experiment, then since |z_+\rangle = {1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle), the x-spin is indeterminate, and we go back to a 50/50 split.
- Finally, if we now believe that measuring the spin also resets the spin state of the particle, like in step 2 above, then the new state of the particle coming out of the {\small +} hole of the x-spin box will have x-spin up so their state will be |x_+\rangle = {1 \over \sqrt{2}} (|z_+\rangle + |z_-\rangle), which is why in the third and last box the z-spin is indeterminate again.

Thus, we are led to ponder another rule to the four we already had for how quantum mechanics works:

Suppose we have a quantum system that is in some state \psi and we perform a measurement on the system for an observable O. Then the result of this measurement will be one of the eigenvalues \lambda of O with a probability determined by the Born rule. In addition,

afterthe measurement the system will evolve to a new state \psi', which will be the eigenvector that corresponds to the eigenvalue that we obtained.

This is, of course, the (in)famous “collapse of the wave function”, and with the background that I have made you slog through it should really be bothering you now.

We seem to need this rule, along with the original rule about eigenvalues and eigenvectors to make our formalism agree with the following general *experimental* fact:

Whenever we measure a quantum system we always get one definite answer, and if we measure the system again in the same way, we get the same single answer again.

The problem is that the collapse rule completely contradicts our existing time evolution rule, which says that everything evolves continuously and linearly via the Schrödinger equation:

i \hbar \frac{\partial}{\partial t} | \psi(t) \rangle = H | \psi(t) \rangle .

This equation can do a lot of things, but the one thing it cannot do is take a state like this

|ψ\rangle = c_1|ψ_1 \rangle + c_2|ψ_2 \rangle

and remove the superposition. With that equation we can only ever end up in another superposition state, like this:

|ψ'\rangle = c_1' |ψ_1'\rangle + c_2' |ψ_2'\rangle .

To bring this back to our example, suppose our S_x box is modeled as a simple quantum system with three states: |m_0\rangle for when the box is ready to measure something, |m_+\rangle for when it has measured spin up, and |m_-\rangle for when it has measured spin down. Here the m is for machine, or measurement.

In our experiment, at the second box, we start with a particle in the state

|z_+\rangle = {1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)

and send it into the S_x box, which starts in the state |m_0\rangle. So the state of the composite system becomes the superposition:

{1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)|m_0\rangle .

This state means “the particle is in a superposition of x-spin up and x-spin down, and the measuring device is ready to measure it.” ^{1}

If we believe that Schrödinger evolution is the only rule we have, then this state can only evolve like this:

{1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)|m_0\rangle \quad \xrightarrow{\hspace 20pt} \quad {1 \over \sqrt{2}} ( |x_+\rangle|m_+\rangle + |x_-\rangle|m_-\rangle ) .

That is, the box and the particle must evolve to a superposition of “spin up” and “measured spin up” with “spin down” and “measured spin down”. The Schrödinger equation never removes the superposition.^{2}

But we never see states like this. Particles go into measuring devices, and those devices give us a single answer with a single value. The world is not full of superposed Stern-Gerlach devices, or CCDs, or TV screens. Furthermore: cats, famously, are never both alive and dead.

Instead, the particle enters the device and we see a universe where only one particle leaves and the device tells us a single definitive answer: either |x_+\rangle or |x_-\rangle. That is, using our notation above, the real world time evolution always look like this:

{1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)|m_0\rangle \quad \xrightarrow{\hspace 20pt} \quad |x_+\rangle|m_+\rangle

or

{1 \over \sqrt{2}} (|x_+\rangle + |x_-\rangle)|m_0\rangle \quad \xrightarrow{\hspace 20pt} \quad |x_-\rangle|m_-\rangle

In order to make this happen, we have to add something like the collapse rule, or some other story, to the theory.

So this, dear friends, is the measurement problem. It is a fundamental contradiction between the observed behavior of real systems in the world, and what the Schrödinger equation allows.

The literature on the “interpretation of quantum mechanics” is of course full of deep thoughts about the questions that the measurement problem raises. I could not possibly do more than unfairly caricature the various possible stances that one could have about this question, so that’s what I will do. Here are some things we can do:

We can take the collapse rule as a postulate and until we understand how measurement works, just use the rules and try to be happy. This view is often called the “Copenhagen” interpretation, although that’s not really right and the Copenhagen story is actually a lot more complicated than this. A better name for this view is the “standard” or “text book” viewpoint.

We can say that quantum states are mainly a tool for describing the statistical behavior of experiments. Ballentine’s book, which I referenced in part 2, has a careful exposition of one version of this line of thought where the wave function only describes statistical ensembles of systems. There are, of course, a spectrum of different opinions about whether quantum mechanics describes any physical reality at all, or just the behavior of experiments.

We can say that the collapse rule is either not needed or not contradictory because quantum states are not really things that exist in the world. Rather, the quantum state is just a way of describing what we, or some set of rational agents, believes about the world. The most recent version of this idea is probably QBism.

We can think that wave functions do not describe the entire state of the system. Instead, there is some other part of the state that gives systems definite measured properties. The most popular version of this idea is the “pilot wave” or “Bohmian” version of quantum mechanics.

We can decide that superpositions don’t actually collapse, we just can’t see the other branches. This is the Everett and/or the “Many Worlds” idea.

We can say that wave functions actually collapse through some random physical process, and we can use this fact to derive the measurement behavior (and perhaps the Born rule). The most famous theory like this is the GRW stuff.

There are dozens more ideas that I will not list here because I don’t understand them well enough to list them.

If forced to take a stance I would probably say that I am most sympathetic to the more “ontological” theories, like Bohm or Everett. My least favorite idea is probably QBism because I have a hard time being enthusiastic about a world where everything is just the knowledge and credences of rational actors. But, in between these two extremes I enjoy the careful and pragmatic thinking that’s been done about the nature of experiments and measurement in quantum theory. I used Ballentine’s book as an example of this, but there is a lot more where that came from (see Peres for example). I feel like what we really need to do is to attack the core question of what is really happening in quantum and quantum/classical interactions. Until we have a better understanding of that I think we’ll never figure out this puzzle.

When in doubt, I will just appeal to my favorite quantum computer nerd: Scott Aaronson for his point of view, which seems right.

I left out a lot of important details related to the structure of Hilbert space. In the finite dimensional case they don’t matter too much but they are critical in the infinite dimensional case. Watch Schuller’s lectures on quantum mechanics to fill those in.

I really only covered the simplest possible models of quantum states, observables and measurements. Mixed state, density operators, POVMs and all that are missing. Schuller’s lectures or any of the more mathematical books that I listed cover this.

I left out the uncertainty principle, which is kind of a big part of the story to skip. You can talk about it in the context of the spin operators but it’s a lot of work and not directly related to the puzzle that I was trying to get to.

I left out the entire huge world of

*entangled*states because I did not want to introduce any more formalism. Entanglement, Bell’s theorem and all that is also just too big a subject to mention and not go into it, so I left it out Maybe we’ll cover that in a future part 4.I never mentioned decoherence. I am a bad person.

I played fast and loose with normalization when talking about quantum states and operators. I should have been much more careful, but I’m lazy.

I wish I could have talked about the two slit experiment. But, I’d have done a lousy job so go read Feynman instead.

Finally, you can do an experiment similar to the chained spin-box experiment with polarized light. Watch here.

Some more reading for you:

If you want to go all the way to the beginning with the original sources, both of the books by Dirac (or look at the Google Books link which is likely to be more reliable) and von Neumann are still pretty readable.

Travis Norsen’s Foundations of Quantum Mechanics is a great introduction to this material. A good combination of nuts and bolts physics and discussions of the conceptual issues.

David Albert’s Quantum Mechanics and Experience (also at amazon) has a nice abstracted description of the spin-box experiment that I have butchered above. This one goes well with Norsen.

Sakurai’s Modern Quantum Mecanics starts with a good discussion of the spin experiments I used as an example.

An older book, Quantum mechanics and the particles of nature, by Sudbery, goes at this from a point of view that I like. Hard to find though.

Hughes’ The Structure and Interpretation of Quantum Mechanics also starts with spin but is a more philosophical look at the material.

The Stanford Encyclopedia of Philosophy has a lot of material on quantum mechanics and its interpretation. Their summary page is also a bit shorter, yet also more detailed, than my effort here.

You should read this paper by Leifer just for the delicious pun in the title. But it’s also a great breakdown of the various ways that people talk about and interpret the quantum state.

This much more technical paper by Landsman also addresses the very complicated question of how classical and quantum states are related. He has an open access book that expands on these ideas, especially in the chapter on the measurement problem. I don’t really understand any of this, but it seems like the kind of work that needs to be done.

Those in the know will notice that I have not really explained what this notation for product states that I am using here means. I did not have the space to explain tensor products and entanglement, which is a shame because along with measurement entanglement is the second huge conceptual puzzle in quantum mechanics.↩︎

For those keeping track, this is the formula I’ve been trying to get to this whole time. Was the 9000 words worth it?↩︎

Almost every book or article about quantum mechanics seems to start with a passage like this:

Quantum mechanics is arguably the most successful physical theory in the history of science but strangely, no one really seems to agree about how it works.

And now I’ve done it to you too. One of the main reasons people write this sentence over and over again is because of what is called *the measurement problem*. Here is a way to state the measurement problem, which I will then try to explain to you.

The measurement problem refers to the following facts, which seem to contradict each other:

On the one hand, when we measure quantum systems we always see one answer.

On the other hand, if you want to use the regular rules of time evolution in quantum mechanics to describe measurements, then there are states for which measurements should not give you one answer.

In particular, measuring states that describe a *superposition* (see below) can cause a lot of confusion.

In part 1 of this series I gave you a bit of the history and motivation behind the development of quantum mechanics. It followed the development of the theory the way a lot of physics text books do, with lots of differential equations and other scary math. We will now leave all that behind us.

My plan here is to describe enough of the mathematical formalism of quantum mechanics in enough detail to express the measurement problem in a way that is relatively rigorous. This mostly boils down to a lot of tedious and basic facts about linear algebra, instead of all the scary differential equations from part 1. Personally I find the algebraic material a lot easier to understand than the more difficult differential equation solving. But, it will still be an abstract slog, but I’ll try to leave out enough of the really boring details to keep it light.

As with my other technical expositions on subjects that are not about computers, I am the furthest thing from an expert on this subject, I’m just organizing what I think are the most interesting ideas about what is going on here, and hoping that I’m not too wrong. I’ll provide a list of more better sources at the end.

The rules of quantum mechanics are about *states* and *observables*. These are both described by objects from a fancy sort of linear algebra. This involves a lot of axioms that are interesting (not really) but not needed for our purposes. To try and keep this section a bit shorter and less tedious I link out to Wikipedia for many of the mathematical details, and just provide the highlights that we need here.

Quantum states live in a thing called a *Hilbert space*, which is a special kind of vector space. Observables are a particular kind of linear function, or *operator* on a Hilbert space.

The ingredients that make up a Hilbert space are:

A set of

*scalars*. In this case it’s always the complex numbers (\mathbb C).A set of

*vectors*. Here the vectors are the wave functions.A long list of rules about how we can combine vectors and scalars together. In particular vector spaces define a notion of addition (+) for vectors that obeys some nice rules (commutativity, associativity, blah blah blah), and a notion of multiplying vectors by scalars that also obeys some nice rules. For reference, you can find the rules here.

We denote Hilbert spaces with a script “H”, like this: \cal H, and we use greek letters, most popularly \psi to denote vectors in \cal H. For a reason named Paul Dirac, we will dress up vectors using a strange bracket notation like this: | \psi \rangle, or sometimes this way \langle \psi |. This is also how we wrote down the wave functions in part 1.

The most important thing about Hilbert spaces is that they are *linear*. What this means is that any given any two vectors | \psi \rangle and | \phi \rangle and two scalars a and b, any expression like

a | \psi \rangle + b | \phi \rangle

is also a vector in \cal H.

This rule, it turns out, is the most important rule in Quantum Mechanics and is famously called the *superposition principle*. You will also see states that are written down this way called *superposition states*. But, this terminology is more magic sounding than it needs to be. This is just a linear combination of two states, and the fact that you always get another state is also a straightforward consequence of the form of the Schrödinger equation (it is what we call a first order, or *linear* differential equation). Linearity plays a big role in the eventual measurement puzzle, so store that away in our memory for later.

The second most important thing about Hilbert spaces is that they define an *inner product* operation that allows us to define things like length and angle. We write this product this way:

\langle \psi | \phi \rangle

and its value is either a real or complex number.

Now we see a bit of the utility of this strange bracket notation. In Dirac’s terminology the | \psi \rangle is a “ket” or “ket vector” and the \langle \psi | is a “bra”. So you put them together and you get a “bra ket” or “braket”. So all of this silliness is in service of a bad pun. Those wacky physicists thought this joke was so funny that we’ve been stuck with this notation for almost a hundred years now.

There is also some subtle math that you have to do to make sure that the “bra” \langle \psi | is a thing that makes sense in this context, but let’s assume we have done that and it has all worked out.

As always, I refer you to wikipedia for the comprehensive list of important inner product facts.

We can use the inner product to define a notion of distance in a Hilbert space that is similar to the familiar “Euclidean” distance that they teach you in high school. For a given vector \psi the norm of \psi is written \lVert \psi \rVert and is defined as

\lVert \psi \rVert = \sqrt{\langle \psi | \psi \rangle}

Since \langle \psi | \psi \rangle is always positive this is well-defined. You can also define the distance between two vectors in a Hilbert space as \lVert \psi - \phi \rVert.

The inner product and the norm will form the basis for how we compute probabilities using the Born rule, which we saw in part 1.

All of this nonsense with Hilbert spaces and inner products is motivated by wanting to do calculus and mathematical analysis on objects that are *functions* rather than plain numbers (or vectors of numbers). This comes up because the big conceptual shift in quantum mechanics was moving from properties that had values which were real numbers to properties described by complex valued *functions* or *wave functions*. The issue was that we know how to do calculus over the reals, but calculus with function valued objects is a stranger thing. *Functional analysis* is the area of mathematics that studies this, and Hilbert spaces come from functional analysis. In the 30s von Neumann realized that functional analysis, Hilbert spaces, and operators were the right tools to use to build a unified basis for quantum mechanics. And that’s what he did in his famous book.

If we wanted to actually prove some of the things that I will later claim to be true about Hilbert spaces and operators we would need some of the more technical results from functional analysis. Doing such proofs is way above my pay grade so I’m mostly ignoring such things for now. But at the end of this whole story I’ll make a list of things that I left out.

After working out the mathematical basis for quantum theory Von Neumann went on to invent the dominant model that we still use to describe computers. So think about that next time you are feeling yourself after having written some clever piece of code.

The third important fact about Hilbert spaces that we will need is the idea of a *basis*. In a Hilbert space (really any vector space) a *basis* is a set of vectors that one can use to represent any other vector in the space using linear combinations. If this set is *finite*, meaning that you can count up the number of basis vectors you need with your fingers, then we say that the vector space is “finite dimensional”.

The most familiar example of a finite dimensional Hilbert space is \mathbb C^n, which is where we do a lot of physics. Here the basis that we all know about is the one made up of the unit vectors for each possible axis direction in the space. So, for n=3 the unit vectors are

\begin{pmatrix} 1 \\ 0 \\ 0 \\ \end{pmatrix}, \quad \begin{pmatrix} 0 \\ 1 \\ 0 \\ \end{pmatrix} \quad {\rm and} \quad \begin{pmatrix} 0 \\ 0 \\ 1 \\ \end{pmatrix}

To write down any vector v in the space all we need is three numbers, one to multiply each unit vector:

v = \begin{pmatrix} a \\ b \\ c \\ \end{pmatrix} = a\begin{pmatrix} 1 \\ 0 \\ 0 \\ \end{pmatrix} + b\begin{pmatrix} 0 \\ 1 \\ 0 \\ \end{pmatrix} + c \begin{pmatrix} 0 \\ 0 \\ 1 \\ \end{pmatrix}

By convention we write vectors in columns, which will make more sense in the next section.

And thus we have built the standard sort of coordinate system that we all know and love from 10th grade math.

This sort of basis for \mathbb C^n also has the property that it is *orthonormal*, meaning that with the standard inner product all of the unit vectors are orthogonal to each other (their mutual inner products are always zero).

In the rest of this piece we will assume that all of our Hilbert spaces have an *orthonormal* basis and that they are finite dimensional. Of course, the more famous state spaces in quantum mechanics (for position and momentum) are infinite dimensional, which is the other reason Hilbert spaces became a thing. But we will not deal with any of that complication here.

In classical mechanics we did not think about observables too much. They were just simple numbers or lists of numbers that in principle you can just read off of the mathematical model that you are working with.

But, in quantum mechanics, observables, like the states before them, become a more abstract thing, and that thing is what we call a *self-adjoint linear operator* on the Hilbert space \cal H. All this means is that for everything we want to observe we have to find a function from \cal H to \cal H that is *linear* and also obeys some more technical rules that I will sort of define below.

Linearity we have seen before. This just means that if you have a operator O that takes a vector \psi and maps it to another vector, then you can move O in and out of linear combinations of vectors. In particular

O(\alpha \psi) = \alpha O (\psi)

and

O(\psi + \phi) = O(\psi) + O(\phi)

The “self-adjoint” (or *Hermitian*) part of the definition of observables is more technical to explain.

As we all know from basic linear algebra, in finite dimensional vector spaces you can, once you fix a basis, write linear operators down as a matrix of numbers. Then the action of the operator on any given vector is a new vector where each component of the new vector is the dot product of the original vector with the appropriate row of the matrix.

So the easiest operator to write down is the identity (\bf 1)… which just looks like the unit vector basis vectors written next to one another

{\bf 1} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{pmatrix}

We can check that the application rule I outlined above works … here we write the vector we are acting on vertically for emphasis:

{\bf 1} \begin{pmatrix} a \\ b \\ c \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} a \\ b \\ c \\ \end{pmatrix} = a\begin{pmatrix} 1 \\ 0 \\ 0 \\ \end{pmatrix} + b\begin{pmatrix} 0 \\ 1 \\ 0 \\ \end{pmatrix} + c \begin{pmatrix} 0 \\ 0 \\ 1 \\ \end{pmatrix}

So it works!

With this background in hand, we can define the *adjoint* of an operator A, which we write as A^* (math) or A^\dagger (physics). Anyway, the adjoint of A is an operator that obeys this rule:

\langle A \psi | \phi \rangle = \langle \psi | A^* \phi \rangle

for any two vectors \psi and \phi in \cal H.

In finite dimensional complex vector spaces (e.g. \mathbb C^n), where operators can be written down as matrices, you can visualize what the adjoint is by transposing the matrix representation and taking some complex conjugates. This is not the cleanest way to define this object since the matrix representation is dependent on a basis, and we can (and did!) define the notion of an adjoint without referencing a basis at all. But it’s not the end of the world.

In infinite dimensional spaces and other more complicated situations finding the adjoint is more complicated. I’ll leave it at that.

A self-adjoint operator is just one whose adjoint is equal to itself. So it obeys the rule:

\langle A\, \psi | \phi \rangle = \langle \psi | A\, \phi \rangle .

We can remove the ^* because A = A^*.

In a lot of physics books you will also see self-adjoint operators referred to as *Hermitian* operators. In the finite dimensional complex case the two terms are equivalent.

Self-adjoint operators have some nice properties for physics. The reason why has to do with eigen-things.

Linear operators map vectors to vectors in a fairly constrained way. You have some freedom in how you transform the vector, but you don’t have *total* freedom since whatever you do has to preserve linear combinations.

But, for every operator there might be a special set of vectors that map to some scalar multiplied by themselves. That is, for some operator A and vector \psi you will have

A \psi = \alpha \psi

where \alpha is just a scalar. What this means, in some sense, is that the operator transforms the original vector to itself. The only thing that changes is its length, or magnitude.

Vectors with this property are called *eigenvectors*, and the constants are called *eigenvalues*. Both words are derived from the German word “eigen” meaning “proper” or “characteristic”, but that doesn’t really matter. This just one of those weird words that stuck around by habit.

Eigenvectors and eigenvalues come up in all kinds of contexts. They are important because they provide a way to characterize complicated transformations in a simpler way. If you have all the eigenvectors you can in principle switch to working in a basis where the transformation is a diagonal matrix, which is a usually simpler representation. The applications of this idea come up all over, from image processing to Google PageRank, to quantum mechanics.

The reason we wanted to have the operators that represent observables be self-adjoint above is that self-adjoint operators have two nice properties related to eigen-things.

All the eigenvalues of a self-adjoint operator are real-valued (even though our state space is over the complex numbers).

There is a famous theorem that says that every self-adjoint operator has a set of eigenvectors that form a

*orthonormal basis*of the underlying Hilbert space. This theorem is called the*spectral*theorem and the eigenvectors/values of the operator are called its*spectrum*. This is a very important result for quantum mechanics.

At this point you might be thinking to yourself, “I have seen this word *spectrum* before”. And you have. One of the earliest problems in quantum mechanics was to explain the spectral lines of the hydrogen atom. So you might be wondering, how do we get from these abstract quantum states and operators to energy? The answer is the next important rule of quantum mechanics, which we are already familiar with from part 1: there is a special observable for the energy of the system whose operator we call H, for the *Hamiltonian*. Time evolution of quantum states is then given by the Schrödinger equation:

i \hbar \frac{\partial}{\partial t} | \psi(t) \rangle = H | \psi(t) \rangle .

You will recall from part 1 that the wave functions, which we now know are the quantum states of a system were all solutions to this equation.

Now, the trick to solving the hydrogen atom is first finding a Hamiltonian H that correctly describes the behavior of the electron in the atom. It turns out that when you do this H will be one of our coveted self-adjoint linear operators on the Hilbert space of wave functions. This means that there will be some set of states that obey this rule:

H | \psi \rangle = E | \psi \rangle

where E here is just a real number, rather than an operator. We use the letter E to stand for energy. These energies will be the energies that appear in the spectrum of the atom.

So here is why we were going on about eigen-things before (and linear operators before that, and vector spaces before that). The Hamiltonian for the hydrogen atom H is a self-adjoint operator whose the eigenvalues are the energies in the spectrum of the atom. The eigenvectors are the electron wave functions that define the fixed energy levels at which we see spectral lines. And an amazing fact about the world is that you can actually set up a model of the hydrogen atom so that things work out in exactly this way. The setup is somewhat technical and complicated, so I don’t cover that here. I’ll use a simpler system to describe the rest of what I want to talk about.

Speaking of which.

At this point we have put together almost all of the formalism that we need. But this post has gone on too long, so I am going to make you read yet another part to get to the real point of this entire exercise. Meanwhile, here is a quick summary of what we have so far:

States are vectors in a Hilbert space, usually over \mathbb C.

Observables are self-adjoint linear operators on that space.

The possible values of observables are the eigenvalues of the corresponding operator, and the eigenvectors are the states that achieve those values. In addition, for the operators that represent observables, we can find eigenvectors that form an orthonormal basis of the underlying state space. Which is really convenient.

There is a special observable for the energy of the system whose operator we call H, for the Hamiltonian. Time evolution of states is then given by the Schrödinger equation.

Of course, I *still* have not said anything about measurement, and you should be furious with me. I promise I will in part 3.

Here are some things I like.

Isham’s Lectures on Quantum Theory, is a nice treatment of the subject that is more mathematically rigorous than most.

Peres and Ballentine are more “physics oriented” books that start from the algebraic point of view. Weinberg is also covers this material, but from a more traditional point of view, but it’s a nice illustration of how the physics view and the algebraic view are related.

Scott Aaronson’s Quantum Computing since Democritus is a nice computer nerd’s view of the world.

Brian C. Hall’s book on Quantum Theory for Mathematicians covers a lot of the more technical details about Hilbert spaces and their operators in more mathematically rigorous way.

Frederic Schuller’s lectures on quantum mechanics also gives you a rigorous mathematical view of this material.

I got it into my head that I should try to explain part of the problem with quantum mechanics on this web site. I am, of course, no expert on this subject at all. But I wanted to do a relatively simple and shallow (but mostly correct) treatment, like my category theory tutorial. So, over the last few months I’ve taken a few different shots at it but never found a way to wind it up into a single coherent train of thought. I wanted to thread my way through the physical puzzles to the mathematical formalism and then end up at the particular formula that, in my mind, sums up at least one of the problems.

I finally realized that trying to fit the whole thing into a single stream of words is beyond my talents as a writer, or at least not a structure that fits well into a single page on this web site. So I decided to split it up. So this first part is just about the move from “classical” mechanics to quantum problems … and then one or more future pages will be about the rest.

As with my other technical expositions on subjects that are not about computers, I am the furthest thing from an expert on this subject, I’m just organizing what I think are the most interesting ideas about what is going on here, and hoping that I’m not too wrong. I’ll provide a list of more better sources at the end.

To understand why quantum mechanics has puzzled people for so long we first have to go back to the mechanics that you might or might not have learned in high school or college physics. You remember …

F = ma,

all those stupid force diagrams with boxes and ramps and ropes and stuff.

It turns out that what all of this nonsense was hiding (which they tell you about sophomore year in college if you major in physics) is that every single one of these problems can be set up so you put some numbers into a single black box, turn a crank, and every answer that you ever needed falls out the other side. This magic box is a set of *differential equations* that describe how the system you have described evolves in time. I am not going to go into the details of how differential equations work, because honestly I don’t know them. But, for reference they look something like this:

\frac {d {x}}{dt} = \frac{\partial H}{\partial p}, \quad \frac {d {p}}{dt} = - \frac{\partial H}{\partial x}

Here, x represents position and p represents momentum (momentum is the mass of the object times its velocity … p = mv. For some reason this is a more convenient way to work than with the velocity directly). H is called the *Hamiltonian*, named after the mathematician who made it up: William Rowan Hamilton. It is a measure of the total energy in the system.

What the formula says, basically, is that if you have a thing and you can express the energy of the thing in the right way, then given any specification of an initial position and velocity of the thing, I can tell you exactly where the thing will be later and how fast it will be moving. All I need is a computer and the formula.

This basic set of mathematics is how we send probes millions of miles into space and have them hit a particular position over (say) Jupiter 5 years from now exactly when we think they will.

We will not really concern ourselves with the mathematical details of all of this, but there are two important thoughts to keep in mind here:

In the above model, every “thing” that we study carries a definite value for these two attributes that we are calling “position” and “momentum.” These two values completely define the behavior of the objects that we are studying using this framework. So, the objects move through space on smooth and

*completely predictable*paths, and it seems like their current state (position and momentum) is absolutely determined by their past state.More importantly, the model above directly computes all possible values of x and p that could possibly exist. That is, when you put your numbers in and turn the crank the numbers that come out are always, within the limitations of experimental error, the numbers that you see when you look at the real world. So you can, for example, throw a ball in the air and carefully track its position and speed at all times, and it will match the formulas pretty much perfectly. Not a lot of mystery.

By the end of the nineteenth century physics had developed two very successful models for how the world works: mechanics and electromagnetism and both fit into the mathematical and intellectual framework outlined above: behaviors determined by smooth and deterministic differential equations that compute values that are “real” in the actual world. Life was good.

The problem was that it didn’t work.

Quantum mechanics was originally born to describe the motion of atoms and things related to atoms. The development of the theory was driven by the experimental discovery of a host of behaviors that “classical” physics could not explain:

The behavior of the so called “black body” radiation.

The photoelectric effect.

The puzzle of why atoms were stable, when according to classical E&M they should immediately collapse.

The appearance of spectral lines that discrete frequencies in the spectrum of an atom.

“Spin” and all that.

The famous two-slit experiment.

And so on. All of these experiments are related to the “motion” of atomic (very very small) particles and radiation. The puzzling thing about these experiments with atoms and light was that while we think of atoms and their constituents as “particles” some of the behaviors that were observed only make sense if you model them as “waves”. On the other hand, classical E&M models light as a wave … but some of these experiments (the photoelectric effect) only made sense if light behaved more like a “particle”.

Over the first quarter of the 20th century various ad-hoc models and ideas were proposed to explain these things. But it wasn’t until the late 20s and early 30s that all of these ideas were codified into a more or less unified theory that we call quantum mechanics. The answer, it turned out, was to model material particles as waves, or “wave functions” that are solutions to a particular differential equation, the famous one from Schrödinger:

i \hbar \frac{\partial}{\partial t} | \psi(x, t) \rangle = H | \psi(x, t) \rangle .

Here the odd notation | \psi \rangle is used to denote the wave function of the quantum particle. I will go more into where this notation comes from in the next part.

The rest of the formula seems familiar enough on a surface level. H is again the Hamiltonian, and as before is related to the total energy of the system you are studying.

In fact, if you noodle around with this formula in just the right way you can come up with some mathematics that does a pretty good job computing the energy levels of the lines in the spectrum of the hydrogen atom. Recall that hydrogen is made up of a single proton with an electron whizzing around it. To explain the spectrum Bohr famously built a hypothetical model of the atom where electrons can only sit in a certain set of orbits that each have specific fixed energies. It turns out that when you set up the equations correctly you can find solutions to a version of the Schrödinger equation that give you wave functions at exactly these energies. You don’t get orbits, but you do get the so-called “stationary states” that are completely stable and match up with the spectral lines perfectly. So in some sense the electron is just sitting there waving around in some space in one of many possible fixed configurations. You’ve seen the pictures of the electron shells, right?

So what we have learned is that we can use Schrödinger’s equation and some smarts to tell us “where” the electron is in the atom. As ever, I will not go into the details. There are any number of books that will explain this to you. For example, Jim Baggot’s book is good for the physics point of view, while Stephanie Singer covers much the same material from a more mathematical viewpoint.

All of this makes you really want to believe that the wave function describes some sort of physical wave-like *thing* spread over all of space (x) and time (t) that will tell you something about the relationship between “where the particle is” and “what the energy is”. The fact that photons and even electrons create interference patterns that are very much like the ones you get from water waves in the two-slit experiment (see Feynman’s famous description here) makes you want to believe this even harder.

But, sadly, this is not so.

The waves in classical mechanics are an aggregate phenomena created by the motion of lots of things (air molecules, water molecules, etc) at once. Even more abstract entities like electromagnetic waves still have a sometimes visible macroscopic manifestation (let there be light!). In addition, as I mentioned before, the classical equations, in some sense, describe behavior that you can *directly observe*. You know the waves are moving through space on a particular trajectory because you can look at (say) the sky and *see* the light shining down on you.

The quantum wave function is nothing like this. Those complex numbers that are waving around are doing so in a space completely disconnected from the real world. In particular, they don’t tell you where the photon or electron *is*. Instead all they tell you is something about the chance that you have of seeing it somewhere if you look there.

But they don’t tell even you this probability directly. Instead, to get probabilities you have to compute something called the *norm* of the wave function, which is a measure of its overall magnitude … like its length if it were a piece of string. We write the norm of the wave function like this: |\psi| or |\psi(x,t)|. If you know how to compute it then the probability of finding an electron (say) at point point x in space would be

P(x,t) = |\psi(x,t)|^2 .

Computing this norm usually involves some kind of fancy integral. This interpretation of the wave function is called the *Born Rule*, and I’m not doing to go into the particular details of how one computes these things here. I will say though that this formula explains the interference patterns that you get in the two slit experiment. This computation turns up in a lot of “beginner” books on quantum mechanics, including the one by Feynman that I linked to above.

This rule feels like the luckiest in a series of lucky guesses. But it is undefeated in terms of experimental confirmation. Every experiment that has been done in quantum mechanics has amounted to thinking about a wave function, defining the right Hamiltonian, and then computing probabilities with the Born Rule, and the numbers are always right. Sometimes they are right to a ludicrous level of precision too.

In the famous double-slit experiment, for example, you send a beam of photons through one screen that has two very thin slits cut into it. Then you put a set of detectors a some distance away behind this screen. The mathematics of quantum mechanics will tell you that you should see an interference pattern on your detector array. It will even tell you the exact shape and configuration of the pattern. If you work hard enough you can probably compute this configuration with a stunning level of precision.

But quantum mechanics can’t really tell you anything about “what happens” to any single particle while it travels between the slits and the detector wall. The theory says nothing about it.

This, I think, is the first great mystery of the theory. It’s not so much that you can only compute and predict probabilities, there are many physical processes for which that is true. The real puzzle is that while the mathematics that I have hinted at above gets all the right answers, it does not appear to provide any insight into any actual physical process from which those answers can be derived. That is, your experiments always work, but it’s never really clear what is “really going on” in the “real” world.

Worse, as is well known, if you try and figure out what happens for yourself by (say) *looking* at each one of the slits to see which way the photon goes … the whole experiment falls apart and you get no quantum interference. Instead the act of measuring the position of the photon in some way seems to lock you into a history where all the photons suddenly take a single well defined path to the detector array, rather than creating the wavy interference that we got before.

This is, as you can imagine, a very unsatisfactory situation. Physics is supposed to tell you *what happened* and *where things go*. Classical mechanics seems to do this perfectly, right down to having an exact and satisfying connection between the mathematical model and what you observe in the real world. We get none of that in quantum mechanics. It is more like a computer program that always spits out the right answer but for which you do not have the source code, so you can’t reason about the exact mechanism by which the answer was generated.

In addition quantum mechanics seems to make you accept a world where the equations that tell you how systems evolve behave one way (the smooth Schrödinger equation) when you leave them alone and another way (no interference) when you look at them. This is one aspect of the so called “measurement problem” and a lot of people smarter than me have thought about it and still find themselves confused. I am also mostly confused about this, but it will take a few more details to get at the core of why.

See you later, in part 2.

If what I have written makes no sense or you want to figure it out for yourself, here are some better sources than this humble web page.

Travis Norsen’s Foundations of Quantum Mechanics is a great introduction to this material. A good combination of nuts and bolts physics and discussions of the conceptual issues.

Baggot’s Quantum Cookbook is a good semi-historical treatment of early QM.

Stephanie Singer’s algebraic treatment of the hydrogen atom is also enjoyable, but much more technical from a mathematical point of view.

This series of lectures from Allan Adams at MIT is very good.

Sean Carroll’s book is OK, as is Philip Ball’s book. They are both good non-technical explanations of the conceptual problems in the theory, to the extent that this is possible. Sabine Hossenfelder’s Youtube channel is also a good source for material at this level.

On a more technical level, this paper about “Quantum Myths” is a nice antidote to the sort of woo woo mysticism that too much of the writing on this subject indulges in.

You should also read all the John Bell stuff, and various things by David Mermin.

So about a month ago I had managed to get past this fight with a solo caster, which as all *Souls* veterans know is the easiest way to solo anything. Even so it took a week or so of noodling around and getting used to the rhythm of things, especially the big change up in phase 2. She becomes much more aggressive in phase 2 and your attack windows become much smaller.

My next goal was to try and get past the fight with minimal casting. To do this I had been running a strength/faith build, mostly using fire damage swords and such since I could not find a faith spell that really worked for general purpose combat.

Let’s see how that went. Yeah, it was going well.

Actually, this video makes it seem worse than it was. I spent a long time trying to work out what my approach was going to be, and died a ton while doing this “research” since I did not really practice any one approach very well. Once I picked something it took about a week to build up enough muscle memory to get through both phases.

I saw three possibilities:

Parry strategy. This works great if you have the reflexes. Which I do not. More later.

Blasphemous Blade strategy. This takes advantage of my faith/melee build to do maximum burst damage because she is weak to fire. It has other issues though. And, the weapon skill on this weapon is a bit too … “casty”. That is, you do a lot of damage from distance if you have enough time to get it off. But those windows are more rare than with the spell caster because the wind up for the attack is too long.

Something else.

Let’s go over these one by one.

Parrying Malenia has two main problems:

You have to learn all the moves that you can parry. There are about six which I will list below.

In a new twist Elden Ring makes you do

*three*parries for every critical against the major bosses. This is to combat the “Gwyn Problem” where if you learn how to parry just well enough the fight is trivial. Needing to constantly get three in a row means that you have to be so good you hardly ever miss.

The main moves that Malenia does which you can parry are:

The short forward dash followed by a wide swing.

The “sword click” followed by a forward dash followed by three fast swings.

The jump to

*your*left followed by a swing.The jump to

*your*right followed by a swing.Two immediate fast swings.

The jump and quick twirl in the air, followed by a swing.

I should have cut together a video showing examples of each of these and me missing. But I forgot to do that before getting past the fight. This Reddit thread describes the moves in some more detail. And this video and this video show most of the moves. But the guy hits all the parries so it doesn’t help to teach you how this fails.

Of course she has other moves you can’t or shouldn’t parry. You can’t parry the spin kick that she uses to get into a lot of her other combos. You can’t parry Waterfowl. You also can’t parry some of the new stuff she does in phase 2. I think you *can* parry the move where she jumps up in the air and then dashes forward and stabs you to death (which she always does after you heal, because she’s a cheater) but it’s not worth it because the attack is so easy to punish anyway. Finally, you don’t need to parry the swing where she swoops up into the air and then lands and asks you to please hit her.

The problem with the parry strat for *me* is that I can only parry the first three of the six moves above with any reliability. God knows I tried. It seems to me that you not only need to get the timing *just right*, especially for the faster moves, you also need to get your spacing *just right* and your parries can fail if you are just a bit too far away, or too close, or not perfectly lined up left to right.

Worse, when my parry misses I don’t know why, because it feels like I did it just like before. If I could even have done five out of six I might have tried for this more because you can then dodge to the other. The two I just could not learn were the two fast swings and the floaty jump to the right. I could get each of them about half the time, but could never figure out what I did wrong when I missed.

So I finally gave up.

I did the fight this way for a while, but didn’t like the feel of it. Later, I also tried dual wielding this sword and a second large sword (Claymore) with fire and doing jump attack spam. That worked pretty well too, but I found it hard to keep the right spacing. So let’s talk about spacing in this fight.

Spacing for this fight is not an issue for the first quarter of her health bar. But, once she is down to 60% or 70% health, she will bring out Waterfowl Dance seemingly at any time, even in windows when you used to think you were safe doing a big attack. Like at the end of this video:

Here she does that jump dash, which you can roll through and then punish. But *not this time*. Joke’s on me.

So, what you want for this fight is something that can poke her for damage from pretty far away, so you can keep your spacing to escape things like Waterfowl as long as you are careful.

The great swords like the Blasphemous Blade and the Claymore are great, but a bit short for this. So I went looking around for other ideas, while at the same time continuing to noodle with this combo.

Note: I realize that if you are good you can evade a point blank Waterfowl Dance using that spin in place and then dodge out at just the right second move … or you can use Bloodhound’s Step, or you can switch to a shield for the first flurry and then dodge out. All of these things will work for you if you have faster muscle memory than I do but they don’t work for me because I don’t. I can only remember two or three moves at a time in a fight like this and needing to switch up in a window even as relatively long and telegraphed as the wind up to WFD is hopeless for me. Believe me I tried. It just never worked.

So the main other thing I tried was to use longer weapons with moderate bleed and/or frost to get my pokes or longer jump attacks. Obviously Rivers of Blood would also have worked here, but I’d already done that.

The real problem with this idea was that this character was a Faith/Strength build and all the moderate bleed weapons are Dex weapons. I tried to do some moderate respecs to get more Dex damage but ultimately didn’t have the patience to try and get to a 50/50/50 Faith/Strength/Dex build … which even if I had done it would have been the ultimate try hard move anyway.

Still, for those of you with Dex builds who don’t want to use Rivers … try cold katanas. These are nice because you can get two kinds of burst damage: one from the frost procs and one from the bleed. I used an Uchi and the Nagakiba with dual wielding. Both are great. They just did not hit hard enough with my build so the fight took too long.

I also tried the Bloody Helice for a while because I saw someone using it in a youtube video. Again, this could have worked on a Dex build. But that’s not what I had.

At this point I had to take an enforced one week vacation from this job to visit some folks and eat Chinese food and stuff. While I was away, I found the way that I would ultimately beat this fight, and of course it was Emarrel who showed me what to do:

This person has been showing me how to play these games better for almost a decade now, and here they are using a giant strength-oriented sword to beat this fight.

Since I had this weapon already, I gave it a try when I got back. And after a week or so it finally worked.

The reason it worked is that the poke attack with the Greatsword (and all the other colossal swords) is super overpowered. It’s as fast as the katanas, faster than the medium sized swords, has fast recovery, and hits harder and from a longer distance than almost anything else. Finally, you can get the poke out of a roll *or* a crouch, so it’s easy to get into as well.

So here is the setup:

Greatsword and 60 strength.

Giant Hunt ash of war, which is also a sort of forward poke and has the added bonus of often throwing the boss up into the air and leaving her to land in a heap. Always fun. Also very good damage for not much FP use.

Put Cold on the sword, for the occasional frostbite burst damage.

Various other body buffs, the strength and stamina Physick for phase 2, and the Godrick rune. You can save scum to not run out of rune arcs and buff items if you want.

Now just keep your spacing, do an R1 poke for fast attacks, and L2 poke for bigger hits, and take the frost bonus when you get it.

If you are better than me you can also chain a bunch of pokes and maybe a jump attack or two to get a stagger for even more burst damage. But I’m not aggressive enough to get this to happen that much. When I try too hard for the stagger I mostly end up out of position and get Waterfowled to death.

Speaking of Waterfowl. If you watch the Emarrel video you will notice that you can dodge this attack if you have almost any distance between you and Malenia when she does the first hop up in the air. So the things to avoid are:

Ending up directly under her when she starts the thing.

Running away too slowly at the start and getting murdered by the first flurry.

Still, I have recently managed to survive this attack even being as badly out of position as in this video:

So maybe some day I’ll be able to avoid it even point blank. But I would not get my hopes up.

What I do instead is try to keep enough distance to be able to run away from the attack and use the long range of the sword pokes to get damage in. This is actually the hardest thing about phase 1 of the fight. She is so passive in phase 1 that to make any progress in the fight you have to walk about and try to hit her in the face. But if you do that you risk being right next to her when the WFD starts.

I ended up using Lightning Spear from range to try and bait out the Waterfowl when I know it’s likely coming anyway. This works remarkably well at least once or twice per run.

Once you are pretty good at phase 1 you just have to see phase 2 enough to learn how to parse the visual cues with all the wings and shit on the screen and get used to the much higher level of aggression. Where in Phase 1 she does a lot of slow walking followed by long combos, in Phase 2 she really doesn’t let up at all and you have to get a tactical poke in here and there to interrupt the flow of attacks. Otherwise you just lose control and die. Like here:

When you can put this all together, and get some good RNG, you will finally win, and it will feel pretty good.

I’m amazed that I only took one hit in that run. I wonder if I can duplicate the effort at level 1. Check back here in six months to see how I do.

After all this time with this fight I still mostly like it. The main obvious bullshit is the blatant input reading (dash attacks after every heal, Waterfowl whenever you over-commit and get too close) that the boss is doing. Phase 1 is also too slow, while phase 2 is too fast. But for me these are minor issues. It’s still the most interesting fight in the game, and I would not really mind going right back to it again right now.

Ten things I forgot to mention in my previous page on *Elden Ring*.

Soul, er, blood echoes, er, whatever farming. There is a great spot where, in the grand FromSoft tradition, you can do pretty fast farming on dumb enemies. You have to do the quest of Varre who you meet in the opening area of the game. It takes a bit of work, and you need to kill one great rune boss. But then you can get infinite levels for a while as long as you have a bow.

Just to be more explicit than I was in my long ramble: Leveling Dex, Faith, and Arcane, running the Uchi and some bleed stuff through the early and mid game, and then melting the late game with the mimic,

*Rivers of Blood*,*Swarm of Flies*, and*Rotten Breath*is a pretty happy way to get through without too much fuss. Even Malenia is not*too*hard with this combo. Of course, get the*Pulley Bow*for the bow cheese!Caelid and Volcano Manor are where all the early power items are. Of course, Caelid and Volcano Manor are both horrible places.

The single worst navigational path in the game is how you get to the other part of the Altus Plateau (where Volcano Manor is) from the part you see first. It’s like driving from Pittsburgh to Cleveland via Toronto.

I tried dual wielding the biggest swords in the game and it was kind of comical.

The jumping attack talisman is sort of over-powered. But of course you have to be good at landing jumping attacks.

One of the most interesting things about this game are all of the different stacking buff items (talismans, armor, physicks, spells) that increase your stats, or your damage, or both. The different combinations are a dizzying puzzle if you are a bit too much into min/maxing. It’s not too hard to find combos that double or triple the damage of specific attacks without necessarily needing to be in some low health “RTSR” mode like in the older games. In my magic run at Malenia the

*Night Comet*spell I used started at about 500-600 damage in a vanilla setup. After the buffs (and a few more levels for a different staff) I got it to do 1300-1500 or more. Fun stuff.Here is the setup: 65-ish INT, Starscourge Heirloom for +5 INT, Marika’s Soreseal for +5 INT and +5 Mind, Godfrey’s Icon for +15% damage on charged attacks, Staff of the Lost in the off hand for +30% damage on Night Sorceries (you only have to hold the staff to get the buff, handy!), Magic Shrouding Cracked Tear in the Physick potion for +20% magic damage (this lasts 3min, so at the end of the fight it was done). I also used

*Rennala’s Full Moon*to for the 10% magic resist debuff, but this only lasts for the first minute of the fight. All the buffs multiply together to get to 2-3x of my original base damage. Super fun.The crazy NPC quest with the talking pot is funny.

The various stat buff items, and the Serpent Hunter spear in Volcano manor are what make me think I could possibly do a low level run of this game. Need to find some friendly spirits to use too. Maybe

*Dung Eater*.Speaking of which, the

*Dung Eater*quest is pretty funny/tragic too. And what weird armor.

Bonus #11: Have to try the bubble horns on a faith build. They look stupidly powerful. Yes, a giant horn that shoots magic bubbles.

I have not written anything since February because of *Elden Ring*. So here we go.

*Elden Ring* is the new game from FromSoft. But you don’t need *me* to tell you that. *Elden Ring* is everywhere. Even my brother tried it. They have sold millions and millions of copies, resulting in millions and millions of twitch streams, youtube videos, tiktoks and no doubt instagram stories (or whatever) about all the strange FromSofty things in *Elden Ring* that various people are running into for the first time.

But, I have some thoughts, so I will write them down. As always, this overview will tell you things that you could have already learned from other places on the Internet weeks or months ago. And there will be spoilers. Even the title is a spoiler.

So, is it good?

Yes, it’s good. It’s probably the best game in the “Souls” framework (i.e. not *Bloodborne* or *Sekiro*) since *Dark Souls*. The love is back baby.

Where to start. Let me explain. No wait. There is too much. Let me sum up.

New names for souls and bonfires, which people will just call souls and bonfires: check.

Giant castles: check.

Giant swamps filled with poison: check.

Villages of the damned? Check.

NPCs and vendors who disappear for no reason, thus locking you out from seeing entire storylines or buying supplies later in the game? Check.

Crazy weapons with crazy move sets? Check.

Andre the Blacksmith? Check (pretty much).

Burn the world down in order to restore order to a fallen society? Check.

Mediocre Dragon fights. Check.

Bow Cheese! Check.

A giant maze of under-city sewer tunnels that all look the same and are populated with terrible giant curse frogs but the actual path through the maze is actually three steps long: check.

Of course, the new game adds a few new things to the mix:

Horse!

Jumping puzzles!

Vertical navigation! I never thought I’d live to see a video game that embodied that relatively unique driving in Pittsburgh feeling of seeing the place you want to be 200 feet below you and having no idea how to get there.

Crafting! Mostly useless. But kinda cool. I am of course famously against crafting.

So many dragon fights.

Giant underground space cities.

And, as usual, FromSoft have also streamlined and softened various aspects of their trademark difficulty engine:

There are almost no long boss runs except in some of the optional dungeons. And even when there are there is a secondary checkpoint system that allows you to avoid them if you want in most cases.

Weapon upgrades are a lot simpler, though still too complicated. There are only two kinds of upgrade stones. There are a ton of them. And, the game now completely separates weapon upgrades from things like elemental (fire, lightning, holy, magic, bleed) damage affinity. The Ashes of War system lets you mix and match weapons with damage types as much as you want with no penalties for switching back and forth. I’ll have more to say about this later because it is by far one of the best things about the new game.

The spirit summoning mechanic is great for people who don’t want to play online, but also want a little help taking the edge off the intense aggression that modern FromSoft bosses tend to have. The spirits are often better than summoning other live players who don’t know the fights yet. They don’t increase the health of the boss and all that. And, if managed correctly they can basically beat the game for you if you want.

The “open world” aspect of the game allows you to distract yourself from the boss you can’t beat by running around and collecting flowers, pots, rocks and clearing mini-dungeons and mini-bosses until you feel like beating your head against the brick wall again. This ability to change the pace of the game is nice.

I had played the game for maybe 50 or 60 hours before realizing you can warp from anywhere you might be standing, and not just from checkpoint locations. The Stockholm syndrome is real.

As a whole I think this video game takes everything that the fans loved about the first *Dark Souls* and brings it up to a somewhat higher level of refinement and execution. I have more detailed thoughts about what they did particularly well below.

As far as overall PVE gameplay goes I think melee characters with utility casting are still probably the most straightforward and strongest for the whole game. Dex/Faith/Bleed in particular seems to be the OP build of choice. That said, you can play the whole game as a caster and have that nice secure caster feeling through the whole game if you do the set up right. This is a welcome change from the later Souls games where it seemed almost impossible to get enough power behind a magic run early in the game. But maybe I was just bad at it.

I have not dabbled in PVP yet, except for one quest-line that required it. I might have more thoughts on that later. Maybe.

OK. Now on to some more specific details.

For me the juicy meat center of the *Souls* games has always been the combat system and the wide variety of weapons and move sets available in the system. No other game series has a combat system that combines relative simplicity (only two kinds of attacks) with the ability to chain various moves into satisfying combos without having to memorize particular sequences of button mashes. Also, the true joy of the games, and the core of their almost infinite replay-ability was always redoing stuff you’ve already done before, but with a different combat style and/or different weapons. In my thoughts on Dark Souls 3 I expressed a bit of disappointment in this area, having found no weapon more fun than the relatively lowly long sword with which to beat the game. So I went back and beat *Bloodborne* twice more.

You will find no such complaints from me with regard to *Elden Ring*. The magic is back … and the lightning, and the holy damage, and the bleed. My god the bleed.

I think this return to form comes from two main sources:

They have somehow figured out how to add some new twists to the already wide universe of move sets and weapon styles from the previous games (two words: bubble horn).

The Ash of War skills are a version 2.0 of the *weapon arts* mechanic from *Dark Souls 3*, something that I spent that entire game ignoring because I could find nothing interesting to do with it. In *Elden Ring* the Ashes of War serve two purposes. First, they add a “skill” or special attack to the weapon (on L2). Unlike the weapon arts from *Dark Souls 3* there are moves here that are incredibly fun to spam over and over again and also coincidentally melt even the hardest in-game enemies, including some of the hardest bosses. So win-win.

But Ashes of War *also* allow you to adjust how the damage on the weapon scales. So, instead of having to choose between 15 different upgrade paths each with their own special upgrade stones you can move a weapon instantly from strength, to dex, to int (magic and cold), to faith (fire and lightning), to bleed and then back again whenever you want by resting at a checkpoint. This way if there are no (say) faith or magic weapons with a move set that you like, no problem! Just take a (say) Claymore and then put it on whatever track you want.

But wait! There is more! Some of the ashes are not really even related to doing *damage* with the *weapon* directly. The most obvious of these are the buffs, especially the buffs involving bleed (oh my god the bleed). There are skills that mimic various spells that you would normally only be able to cast if you had built a caster. There are skills that are there just to stun lock enemies. Finally, there are some skills that are just better dodge rolls and have nothing to do with damage at all.

Over all the Ashes of War are a brilliant refinement of a mechanic that I honestly never gave a shit about. But now I’ll actually run around the game for half an hour to find some obscure skill just to play around with it.

A few favorites:

Taker’s Flames on the Blasphemous Sword (a boss weapon no less!), which does huge fire damage and also heals you from the damage.

Whatever the crazy gravity move is on Radahn’s Sword (another boss weapon!)

Everyone’s favorite L2-spam boss melter: Corpse Piler from the Rivers of Blood katana. The related, but much weaker Bloody Slash is also fun.

The Artorias flippy flippy attack that’s on the Claymore (Lion’s Claw).

Golden Vow, a 20% damage buff. No faith needed.

Square Off, the default on the Long Sword, does

*ludicrous*damage in the early game.I’m told that Bloodhound’s step is great. I haven’t tried it … yet.

The first two items on this list remind me to cover one other aspect of the weapons in this game. The boss weapons are actually good! Well, at least three of them are (Rykard, Radahn, and Malenia). This is unheard of.

Finally, no discussion of the weapons is complete without cheering for the return of the giant pizza cutter wheel from *Bloodborne*. What a great meme weapon:

The video also shows one of the hilarious new status effects in the game: sleep. There is even a sword that deals “sleep damage” as its main thing. Usually sleep is just a softer stagger effect, but the asshole in the video actually goes to sleep and lets you murder him. I find this endlessly enjoyable because I am a child. You do the same thing later in the game when you have to fight both of these assholes at once:

I will never not love this.

Bow cheese gets its own section, because I love it. Any time there is a door that is too small for a large enemy, like fat ogres:

or a giant worm dragon:

or a giant dragon dragon:

Or anytime your foe walks slowly in straight lines:

Or is stuck somewhere and can’t path to you:

You are all set my friend. Just R1-R1-R1-R1 until they die. I will even circumnavigate a giant castle to bow cheese something:

Bows in this game are great. You can even get a move that apes the terrifying rain of arrows that one of the bosses uses against you. I do not make the best use of it in this video:

But with the right build, this move melts things.

Speaking of builds. The fairly open game world of *Elden Ring* makes it perhaps more straightforward than in the other From titles to run around the map picking up useful items and upgrade materials before getting on with the business of “progressing” the game. Of course, the main places to do this seem to also be the most hostile to low level characters (poisoned swamps, towns built on lava, that sort of thing) and as always some of these schemes involve dying on purpose. But, as long as you are good at running away, you can get yourself a pretty powerful setup fairly quickly.

The best things to go after, in order of easiness are:

The Radagon soreseal, which gives you +5 to health, endurance, strength and dex. The extra health and endurance are useful even for non-melee characters. And the strength and dex points will let you get to the minimum stats needed for most of the weapons usable early.

Somber smithing stones. It’s not too hard to get sombers up to 6 without killing much of anything. You used to be able to get up to 9, but I hear that they finally patched out the jump you need to make to get the 7. So you can’t do that anymore.

Various early golden seeds and sacred tears. For healing potions.

Regular smithing stones. You can get the stones up to 5s (and a few 6s) in various mines and tunnels. Just don’t fight the bosses until you really need to. Getting more 6s, and the 7s and 8s is harder. You get three normal weapon levels from each class of regular smithing stone, so +15/+18 normal is sort of like +5/+6 somber. But you need more regular stones to get there (12 for each set of 3 levels) … and in general it’s much more of a pain to get a regular weapon to high level than the somber weapons.

Lots of other buff items, mostly talismans, but also weapon skills and spells and such. Golden vow, the flame damage buff spell, and the physical damage (Strength and Dex) physicks are the best things to get here.

To get an idea of how to approach this watch some of the speed runs on youtube. Speed runners are the best at building powerful characters fairly quickly. Although sometimes the stuff they do to make this happen is impossible for mere humans to duplicate.

The one stat buff item aside, your main concern is leveling weapons. Weapon leveling is the key to getting powerful in FromSoft games, and *Elden Ring* is no different. Character leveling is mostly for health, and thus latitude for making mistakes. In general before the late game you can plan on getting enough stones to upgrade one or two somber and one or two regular weapons to close to max level. You can’t really do more than this, so keep this in mind before when budgeting upgrade materials.

As I mentioned above, one happy development in *Elden Ring* is that it’s not too hard to run a mostly casting magic person through the whole game, without needing to spend most of the early game playing melee while your casting sucks. You have to be a bit picky about which staff you use, and getting a good setup depends a lot on various buffing strategies. But overall it’s pretty viable and fun to just sit back and R1-spam little blue bolts of death (or large purple rocks) and not have to think about learning dodge timings.

My favorite spells right now are:

Great Glintstone Shard because it hits harder than Pebble. Pebble is a also dumb name for a thing that is supposed to deal death.

Night Comet because certain bosses don’t dodge it.

Rock Sling for the stagger.

The giant Comet spell is useful sometimes, but hard to set up.

I guess I should try the flurry of stars spells (Star Shower? Stars of Ruin?) but I didn’t follow the right NPC quests to get them.

I also tried doing a more casting oriented Faith build, but it doesn’t work out. Especially in the early/mid game I could not find an incantation that hit hard enough to use as a main thing. So I got the creepy fire sword from the snake guy instead.

Both Faith and Magic users seem to have a wide variety of small fast weapons (so many katanas) to choose from to supplement casting with melee. The variety of faith weapons seems more extensive and includes all the bleed stuff and all the light sabers.

The choice for casters who also want to use giant swords is a bit more limited. But you can do it. Especially if you use magic just go get the Radahn Sword (the Starscourge Greatsword). It’s hilarious.

I like the “New Chalice Dungeons” (Tunnels and Catacombs) because they feel more connected to the game and less random than the Chalice Dungeons from Bloodborne. The fact that they are not randomly generated is also a win. The “Evergaol” (“Everjail”) mini-boss fights are also fun.

There have been complaints about the balance of enemy difficulty vs. reward. And I will echo these complaints. It’s odd to have a certain class of super tanky respawning level enemy that can be harder to kill than many of the final bosses and gives you almost nothing in return. Not even a lot of souls (no wait … blood echoes … no wait … whatever).

There have been other complaints about recycled bosses. I am less sympathetic to this complaint. I need all the practice I can get learning these complicated boss move sets. So having multiple tries at it is a win for me.

Fuck that tree level man. It’s so mean.

Another combat mechanic that comes from the earlier games that I am still trying to like but mostly just suck at is dual wielding. You can do some stupid damage this way, especially, of course, with bleed and cold. I remain too uncoordinated to make this work, especially as a I reflexively L1 to block and end up swinging instead. Maybe I’ll do a forced dual wield run.

I don’t like the new crystal lizards. Half of them drop nothing. Which stinks and is a waste of time. Do better FromSoft.

So many buffs. So many different kinds of buffs. I have not even talked about the Wonderous Physicks.

Oh. I can’t forget about the single most disturbing FromSoft enemy ever:

Luckily, burning them is hilarious:

OK. With all that out of the way let’s talk about Malenia.

Malenia is an optional boss at the end of the also completely optional “Haligtree” area of the game. This is one of the toughest areas of the game, even if you are at end game stats, so it is appropriate that Malenia is one of the toughest fights in any recent FromSoft game. On an overall difficulty scale she can be right up there with Ishin, the three phase final boss in *Sekiro*, and the Orphan of Kos from *Bloodborne*. The fight is mechanically rich, hard to learn and parse, and hard to control when you are in the middle of it. And yet, if you collect up enough help, luck, or memes it can also be on the trivial side. You can watch a youtube video of someone doing this fight and never actually duplicate their exact experience unless you know *exactly* what they were doing. It is this multiple nature that makes the fight so interesting.

The first time I beat this boss I did it with the mimic, brute force, and a lot of luck. That was fine, for the time, but I immediately felt sort of unsatisfied. So after I finished my first playthrough I immediately started two more, one with a caster and one a more mixed melee and faith character partly to get a first or second look at stuff I missed, and also to get back to this fight to learn it again.

My second time through the fight was with a caster, and I really wanted to do it without the summon. I was able to do this, though there are still parts of phase 2 that I can’t counter with 100% reliability. Phase 1 I mostly understand, and I am proud to say that I mostly learned to dodge her most famous move, the Waterfowl Dance multi-flurry attack. But I have to be far enough away from her to make it work. The trick is to run away from the first two bursts, and then dodge back into her and make her fly over you so she is too far away for the second two bursts to hit you. It’s like how you backstab the Orphan in the first part of that fight.

Getting through phase 2 without a friend is really tough because her aggression is pretty relentless, and some of the attacks in phase 2 come out looking really similar, so you do the counter for the wrong thing and then die.

Here is what it’s like when you lose control. This happens easily in Phase 2 but I’ve had plenty of this feeling in Phase 1 as well.

The next frontier is to do the fight solo and with mostly melee instead of mostly casting. I have not as yet successfully even done a solo fight melee oriented fight where I got to phase 2. Mostly I can’t keep from getting hit, and the way the fight is designed she ramps up her aggression on every hit she lands until finally the whole thing gets out of your control and you die.

The next next frontier will be do try and do the fight (and the rest of the game) at low level. I think this game lends itself to low level runs because even low level characters can use super powerful weapons. But, perfectly avoiding all damage will, of course, be really hard.

This fight will keep me interested for a long time to come, much like Manus in *Dark Souls* or the Orphan in *Bloodborne*. It has a lot of different layers to it, which makes it fun. And, it really rides an almost perfect satisfaction curve as it progresses from looking completely impossible (Waterfowl Dance!?!?!?), to randomly deadly, to sort of controllable as you understand it better and better. You will never be happier and sadder at the same time to finally win a fight in a video game than when you first beat this one. I wonder whether they built this fight first, and then put the rest of the game around it. None of the other late bosses in *Elden Ring* are even half as interesting, so I like to think that this was the case. So of course in true FromSoft fashion they trolled us all by making it “optional”.

Here’s my one solo win. I’m stacking three or four different magic buffs to get that damage (the magic damage wondrous physick, a magic damage talisman, the charged attack talisman, the staff of the lost on the off hand that buffs night comet, the the magic resist debuff from the full moon spell). I’d describe it all in more detail but this post is already too long. So maybe next time.

In my opinion this is the fight that makes the game. I really like it. Playing the game without doing this fight is almost to have not played the game at all. The weapon you get from winning is also great.

Long time readers of this site will know that I have spent a non-trivial portion of my adult life writing Objective-C code for the NeXT^H^H^H^H MacOS and iOS platforms at various stages of their development.

One quirk in the Objective-C runtime that every programmer needs to deal with is the following strange behavior: if you call a method on an object that is `nil`

the call just falls through like nothing happened (usually).

That is, if you make a call like this:

`result = [someObject someMethod:someArgument]`

and `someObject`

is the value `nil`

, so it points at nothing, the behavior of the Objective-C runtime is just to ignore the call like nothing happened and effectively return a value similar to zero or `nil`

.

In the more modern versions of the runtime, the area that the runtime uses to write the result of the method call is zero’d out no matter what the type of the return value will be. In older versions of the runtime you could get into trouble because this “return 0” behavior only worked if the method returned something that was the size of a pointer, or integer, on the runtime platform. And on PowerPC if you called a method that returned a `float`

or `double`

you could get all kinds of undefined suffering.

Anyway, I was having a chat with a nerd friend of mine at work, and we both got curious if this behavior dated back to the *original* Objective-C runtime or if it was added at some point. With the entire Internet at our fingertips surely this could not be that hard to figure out.

So I poked around, and the earliest reference that I could find with straightforward searches was a reference to this behavior on USENET in 1994:

You can read that here.

I did manage to find a copy of the original Objective-C book on archive.org, but there is no mention of this behavior in that book.

I also found this historical article from the ACM last year but it also did not specifically talk about the `nil`

-messaging behavior.

Then I realized I should search bitsavers.org but I really wasn’t sure what to look for and the site was loading slowly. Disappointed, and feeling lazy, I decided to see what twitter thought:

This, it turned out, was the perfect thing to use the giant nerd village for.

Within a few minutes there was confirmation that the behavior certainly existed and was documented by 1995

Then later we got back to 1993

There was runtime nerding by runtime nerds.

There was also funny runtime snarking by the same runtime nerds. This tweet, by the way, is true. I have seen such code and will leave it at that.

Then, I got this message which won the day with a reference back to a post in the Squeak forums, of all things. So now we have the following facts:

The long post verifies that the original Objective-C runtimes threw an error when told to send messages to

`nil`

, and that this was changed to the current fall-through behavior in a release of some software called “ICPack 201”. This package was released by a company called Stepstone, which originally developed and owned the language in the 80s.The only information about this company that I could find on the Internet was the wikipedia entry which mentions “ICPack 201” but does not say when it was released. But, it

*does*say that the package was proposed to the Open Software Foundation when they did their Request for Technology for their window management system, the software that eventually became*Motif*(shudder).Now, the wikipedia entry for Motif says that the RTF for the OSF window manager happened in 1988, so this mean that “ICPack 201” must have shipped sometime around 1988. Hooray!

Finally, a few hours later I got this reference to more NeXT documentation from 1990. Of course this manual is on bitsavers, like I figured it would be.

While in the end I could have found it myself, the great nerd convergence around this question was kinda fun.

But, don’t let this apparently heartwarming story change your mind about twitter. It’s still a cesspool that should mostly be avoided.

But once in a while it’s not too bad.

It turns out that I did find this stackoverflow thread in my first few searches, and it also has the reference to the Squeak post. But I missed it my first time through. So there you go. If I had had better eyes I would have missed out on a minor twitter storm.

I have this long standing problem with the iTunes/iTunes Store/Apple Music catalog system. I can summarize it in one screen shot.

Open up Apple Music on your Mac or iOS device, it does not matter, and in the search field try and search for “Martinů viola”. Martinů was an early 20th century Czech composer of modern classical music that I particularly enjoy. In particular he wrote a lot of interesting music for viola. Anyway, here are the results you will get:

Anyone who knows me has heard me complain about this behavior. My emotional state about it swings from amusement, to detached fatalism, and occasionally to unhinged anger.

But, I have a guess about why it happens.

The thing is, the iTunes (and thus Apple Music) catalog model is built on *songs*. So the search engine is, naturally, also oriented towards finding *songs*. So if the data you want is not in the *song*, and in particular the *song title* or the *main artist*, iTunes will not find it.

But, the classical music albums (and to some extent jazz, as well) defeat this assumption by not putting enough useful metadata in the songs. The titles do not contain the composer. The main artist is usually the performer, and not the composer. The composer is shuttled off to its own field of the song record, which I assume is either not indexed at all, or not weighted heavily when evaluating the relevance of the search results. Here is a modest example:

Here Martinů appears in the album title and in the composer fields, but never in the places that are important: song titles and the main artist field. The result is that when I search for “Martinů” in the general Apple Music catalog, Apple Music does not find high relevancy hits for the name “Martinů” anywhere, but it *does* find good hits for various “corrections” of the spelling of the name. So it thinks I am confused about the name and it shows me that stuff instead. The only way to find this record is to search on the performer’s name. But that assumes that I already knew that I wanted this specific performance, and is thus useless if I want to see all of the available performances of some particular thing.

When I search my *local* catalog for the same string, it does better. Partly this is because I have adjusted my local metadata, partly this is because my local catalog does not have as much music by popular artists whose name is a short edit distance away from “Martinů”.

The fix for this would be to either fix the search engine (unlikely) or fix the data (also unlikely). In particular, if we rewrote all of the Apple Music meta-data to follow this rule from my list of iTunes Rules, then things would be better:

The second rule is: the iTunes data model is fairly simple, so aggressively de-normalize the data. This is especially true for Classical music where the single artist single song model really breaks down. If you are not careful, you’ll go and browse albums or songs on the iPod and see 50,000 titles called “String Quartet XYZ in B Major” and so on. This is useless. The solution is to put the key artist or composer in every field of the database so they will show up in all major views in both iTunes and on the iPod. Of course, you have to do some work to be careful and keep your de-normalized formats as uniform as possible. Life is hard.

Basically, we should spam *all* of the database fields with the composer name … but especially the song titles and main artist fields. Those are the only bits that the search engine really seems to care about (which kinda makes sense), so we should tune the meta-data to take advantage of this.

Then maybe people would start complaining that they get some Martinů viola concerto album when they were searching for their favorite Martina Mcbride record. Or the Halo 2 soundtrack.

Two more notes:

Spotify seems to be better at this, as it appears to realize that the name “Martinů” is important in this context so it does not weight name corrected hits as highly. If you dig a deeper into their search hits you can still see some silliness, but it’s not as bad. This is why I sometimes keep Spotify around for random explorations of different areas of music. But for the most part I find their app annoying to use.

The best way to do these kinds of searches is to type, for example,

`Martinů site:music.apple.com`

into your favorite search engine, because just matching the literal text ends up working better than the various name correction heuristics in the music service search engines.

The James Webb Space Telescope finally took off into orbit today. In the lead up to the launch I had read up some of the background for this device, and let me just say that what they are trying to do with this telescope is super ultra bonkers.

The super short summary is: fold a telescope mirror that is four times the area of the Hubble *and* a giant foil sun-shield that is even larger into a (relatively) tiny little tube and shoot it into an orbit a million miles from the Earth where it unfolds by remote control. The sun-shield keeps the sun off of it so it can stay cold enough to collect the infrared light that its mirror is optimized for.

At this point my brain twitched a bit. Infrared? Why optimize a telescope for infrared?

Well. You may recall the story I told once about telescopes and spectral lines. See, it turns out that when you shine light through a cloud of glowing gas (say) and look at it with an instrument called a *spectrograph* that breaks the light up into a spectrum, the spectrum will be dotted with either dark or bright lines at very specific frequencies. Like this:

The mechanism that causes these lines to appear where they do had no explanation until the early 20th century, and was one of the founding problems of quantum mechanics. But that’s a story for a future article.

The spectral lines of hydrogen and other atoms have taught us more than you might be able to imagine about how the universe works and what it is made out of. As I also mentioned in my older article, one of the most important things they tell us is that the universe is expanding. Let us review.

Since light, in some ways, acts like a wave, it is subject to an effect called the Doppler shift. This is (sort of) the same physics that makes the pitch of a sound seem to shift up if the sound (shorter frequencies) is moving towards you and down (longer frequencies) if the sound is moving away from you.

When you take the spectrum of an object that is emitting light, and that object is moving away from you very quickly, you will notice that all of the spectral lines shift towards the red part of the rainbow. In the 1920s Edwin Hubble used this fact to show that the universe is expanding by collecting light from far away galaxies and noting that the further away the object was, the stronger the red shift. This fact combined with other evidence like the existence of the cosmic microwave background radiation has led to all of our current cosmological models for the overall evolution of the universe.

With this context, the motivation for building a giant IR telescope is more clear. If you make the mirror big enough to see things even further away than the Hubble can see, then the red shifts will be even stronger and will eventually shift the light completely out of the visible and into the infrared. So, what the JWST will do is fly in an orbit a million miles from the Earth and sit behind a giant piece of tin foil blocking all the light and residual IR warmth coming from the Earth, the Moon and the Sun. Then it will point its giant mirror into the great void hoping to collect light that is more than 13 billion years old. This will give us a view of the oldest and most distant stuff that we can currently observe in the universe. These things are so far away and so old that you can’t really imagine it.

Here: the stuff is three or four times as old as the current age of the solar system, which is already older than you can possibly imagine. See?

Anyway, if all goes well over the next few months there will be a new robot telescope sitting in space continuing the grand tradition in modern science of being able to look at things that we can’t even begin to see or imagine in our every day existence. This is both the basic tool and the ultimate allure of science. I of course ended up working with computers, which are nothing if not strange and invisible puzzles. But they don’t tell you too much about the big questions, like how the universe began. At least not directly. I still like to try and keep up with astrophysics though. The ability of humans to look at and learn about things that are so far beyond what we can see around us in our immediate “real” world is the only reason to have any real hope for the future these days. So let’s hope it works.

So we’re coming close to the end of year *two* of the great dark stupidity. I don’t have a pithy end of year post in me this time. It requires too much energy to generate something like that, especially given the general weight of narcissistic self-interested bullshit that hangs so heavily over everything.

As I’ve said before, I am *personally* fine, but the world overall does not show many signs of macroscopic improvement. Though I would be negligent if I did now show at least a bit of appreciation for the few small tastes of real life we had in the summer and early fall. Concerts even. Those were good.

Here’s hoping the pieces get bigger over time.

I’m not optimistic though … let’s let tiktok, of all things, illustrate things.

Today a few things that I’ve added, or improved, to my home kitchen repertoire since the great darkness began. No sourdough bread here, sorry.

OK. This is a bit of a cheat since I’ve already mentioned this in my ode to ma-po tofu in 2020 earlier. Still, it deserves another mention. Make ma-po tofu, put it on tater tots.

That’s all.

I stole this idea from this guy, but tweaked it a bit for my own style.

You can do this an Instant Pot (best), rice cooker (also good) or on the stove (probably fine, but tedious). Put the following things in the pot:

1 rice cooker measuring cup of rice. These are smaller than an actual cup measure. But don’t know by how much.

6 to 7 cups of water, or water and chicken stock.

A bit of “chicken powder”, or MSG, or both.

5 or 6 big slices of ginger.

1 4-6oz piece of frozen white fleshed fish fillet (cod or whatever). This is the one sort of fish I buy from Whole Foods.

Turn the pot on. For the Instant pot I run a “porridge” cycle which is 10-15min at high pressure. You should run whatever cycle is appropriate for your device. When the pot is done, stir the rice around. Add some soy sauce, salt, white pepper, and green onion. Now you have hot breakfast for a week. Throw a soft egg on top for an extra bonus.

Gently poached chicken has been a staple of mine for years. But I thought I’d make an extra shout for it out now for two reasons:

More people need to see this. This is so much easier and better (in some ways) than roast chicken. I don’t know why it’s not more of a thing.

This technique plays a big part in Dave Chang’s new cookbook. So I gotta ride that train too.

Here is what you do. First, buy a medium sized whole chicken. 3 to 4 pounds is ideal.

Now fill a pot with enough water to cover the chicken. Bring it to a soft boil and dunk the chicken in. If you troll around youtube you’ll see a technique involving multiple dunks. But you don’t need to do this. Just drop the chicken in the water and turn the heat to low. Eventually the water will come back to a gentle simmer. Cover the pot and let the chicken sit like that for, say, twenty minutes to an hour depending on how cooked you want the meat. Then turn the heat off and let the chicken sit some more.

When you are done you’ll have beautifully tender, soft and moist chicken meat. Pull it all off the carcass and put it in the bowl to drain and dry off. If some of the meat is not completely cooked, don’t worry. You can finish it off later when you use it in some other dish.

Now take the mostly bare bones and put them back in the pot with onion, celery, carrots and whatever else you want (or, for a more Chinese/East Asian style stock, just some celery, ginger and green onion) and simmer that for another couple of hours. Pour out into pint containers and stick them in the freezer.

Now you can make at least two meals in any number of ways.

For chicken rice, take some rice and put it in the rice cooker with the stock. Cook at normal, serve with the poached meat from above and chili sauce.

For chicken soup saute some onion, celery and carrots in a soup pot on medium heat until they get soft. Then add stock, water, white wine, your favorite greens, and some of the cut up meat. Bring up to a boil and then simmer it for a while. Finish with salt, white pepper, fish sauce or soy sauce and a bit of MSG.

For an extra umami bomb soup try the mushroom soup.

Add the cooked meat to your congee for extra protein.

For chicken and biscuits … start the same way as the chicken soup, but only add a cup or two of stock and then thicken the whole thing with a “bechamel” that you make from half a stick of butter, 4 tablespoons of flour and a cup of milk. Cook that until it’s nice. Serve on top of biscuits.

The possibilities are endless.

Extra note: the skin on a poached chicken seems to be a thing people disagree about. Chinese people seem to love it. Others, no so much. More for me.

This is the single best cooking idea I’ve had in ten years.

The NYC egg and cheese (or, more correctly, baconeggandcheese, one word) is a particular class of sandwich that is hard to find other places. A few places in Pittsburgh try to do breakfast sandwiches but mess them up by fussing too much over the bread, or by doing them on horrible biscuits, or committing any number of other sins. But, one local place has done this really well: The Pear and the Pickle. But, the pandemic has been hard on them, and it’s not clear whether they are going to come back next year.

So I have been experimenting with this dish at home, especially after noting that I could buy the same rolls that P&P use in their shop (the rolls are very important). So here is what I do.

You need: a bag of Mancini’s Egg Kaiser rolls, one egg, some bacon, and one slice of Kraft American cheese.

Cook some bacon however you like to. I make it crispy in a pan, or microwave.

Make a soft fried egg with a bit of salt and pepper. You don’t need too much salt, since the Kraft Single and the bacon will be salty.

While the egg is cooking (for 3 or 4 min) cut a roll in half, put a the Kraft Single on one half and put both halves in your toaster oven and lightly toast it. It takes my toaster just about 4min to get the roll to the right state and melt the cheese a bit. You don’t want it all the way toasted … just a bit, to dry out the upper layers of the roll and make it a bit squishy.

Now lightly squish both halves of the roll and then put in the egg and bacon to construct the sandwich. Cut in half and enjoy.

Our old friends from the Thin Man sandwich shop did a popup today in the space that Black Radish uses as a catering kitchen. Thin Man closed in 2017, a casualty of skyrocketing rents in the Strip District. But while they were open they were easily doing some of the best sandwiches in town.

Their signature eponymous sandwich, the *Thin Man*, is a thick schmear of a great chicken liver mousse, endive, and bacon on a crusty roll:

Today they also had a spicy Vietnamese style beef meatball sub, that was great.

Anyway. I post this just as an extended shout out that does not fit well into our currently shitty social media channels. And also to point out that I realized at this event that in my earlier missive about perfect sandwiches I had left Thin Man off the list. And I should not have. Their stuff is great, and as good as I’ve ever had. Though the bread might be a bit too fancy. And don’t sleep on the chicken liver mousse. Get thee to the next popup and get some, if you can.

In a fit of nerd cliché, I spent the last month or two trying to understand the *Yoneda Lemma*. It turned out that what I really needed to do was to figure out how every different writer comes up with their own strange notation to write the result down. So of course I wrote a document explaining this to myself. In an equally predictable twist, to do this I made up my own notation for everything. But I list most of the others too, since that was the point.

Then I translated the \LaTeX into markdown (mostly with pandoc, I’m not an idiot) and added this blurb. So now you can read it here too. This page was the inevitable result of making a web site that can render \TeX. So I might as well own it.

But, the pdf looks much better: so you should read that instead.

**Note**: I am not a mathematician or a category theory expert. I just wrote this down trying to figure out the language. So everything in this document is probably wrong.

The Yoneda Lemma is a basic and beloved result in category theory. Even though it is called a “lemma”, a word usually used to describe a minor result that you prove on the way to the main event, the Yoneda lemma *is* a main event. It is a result that expresses one of the main goals of category theory: it characterizes universal facts about general abstract constructs.

Its statement is deceivingly simple [8]

Let \mathbf{C} be a locally small category. Let X be an object of \mathbf{C}, and let F: \mathbf{C}\to {\mathbf {Sets}} be a functor from \mathbf{C} to the category {\mathbf {Sets}}. Then there is an invertible mapping \mathop{\mathrm{\mathit{Hom}}}(\mathbf{C}(X, -),F) \cong FX that associates each natural transformation \alpha:\mathbf{C}(X,-) \Rightarrow F with the element \alpha_X(1_X) \in FX. Moreover, this correspondence is natural in both X and F.

But as Sean Carroll famously wrote about general relativity, “…, these statements are incomprehensible unless you sling the lingo” [1].

I am going to do the following dumb thing: having stated a version of the lemma above I’m going to define only the parts of the category theory needed to explain what the lingo means. There are five or six layers of abstraction that I will try to explain. As for the larger meaning of the result itself, you are on your own. I won’t explain that, or even really show you how the proof goes.

In the spirit of video game speedruns [6], we will skip entire interesting areas of category theory in the name of getting to the end of our “game” as fast as possible. Clearly this will be no substitute for really learning the subject. Any of the references listed at the end will be a good place to start to better understand the whole game.

**Note**: Again, I am not a mathematician or a category theory expert. I just wrote this down trying to figure out the language. So everything in this document is probably wrong.

Categories have a deliciously chewy multi-part definition.

**Definition 1**. A *category* \mathbf{C} consists of:

A collection of

*objects*that we will denote with upper case letters X, Y, Z, ..., and so on. We call this collection \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}). Traditionally people write just \mathbf{C} to mean \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) when the context makes clear what is going on.A collection of

*arrows*denoted with lower case letters f, g, h, ..., and so on. Other names for*arrows*include*mappings*or*functions*or*morphisms*. We will call this collection \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}).

The objects and arrows of a category satisfy the following conditions:

Each arrow f connects one object A \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) to another object B \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) and we denote this by writing f: A \to B. A is called the

*source*(or*domain*) of f and B the*target (or*codomain*). Source and target are somewhat more intuitive terms, but domain and codomain connect the language to functions in other areas of mathematics.For each pair of arrows f:A \to B and g : B \to C we can form a new arrow g \circ f: A \to C called the

*composition*of f and g. This is also sometimes written gf.For each A \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) there is an arrow 1_A: A \to A, called the

*identity*at A that maps A to itself. Sometimes this object is also written as \mathrm{id}_A.

Finally, we have the last two rules:

For any f: A \to B we have that 1_B \circ f and f \circ 1_A are both equal to f.

Given f: A \to B, g: B \to C, h: C\to D we have that (h \circ g) \circ f = h \circ (g \circ f), or alternatively (hg)f = h(gf). What this also means is that we can always just write hgf if we want.

We will call the collection of all arrows from A to B \mathop{\mathrm{\mathit{Arrows}}}_{\mathbf{C}}(A, B). We will usually write \mathop{\mathrm{\mathit{Arrows}}}(A,B) when it’s clear what category A and B come from. People also write \mathop{\mathrm{\mathit{Hom}}}(A, B) or \mathop{\mathrm{\mathit{Hom}}}_{\mathbf{C}}(A,B), or \mathop{\mathrm{\mathit{hom}}}(A, B) or just \mathbf{C}(A,B) to mean \mathop{\mathrm{\mathit{Arrows}}}(A,B). Here “\mathop{\mathrm{\mathit{Hom}}}” stands for homomorphism, which is a standard word for mappings that preserve some kind of structure. Category theory, and the Yoneda lemma, it it turns out, is mostly about the arrows.

I have broken with well established tradition in mathematical writing and mostly spelled out names for clarity rather than engaging in the strange and random abbreviations that I see in most category theory texts. The general fear of readable names in the mathematical literature is fascinating to me, having spent most of my life trying to think up readable names in program source code. Life is too short to deal with names like \mathop{\mathit{ob}}, or \mathbf{Htpy}, or \mathbf{Matr}. Luckily, for this note the only specific category that we will run into is the straightforwardly named {\mathbf {Sets}}, where the objects are sets and the arrows are mappings between sets.

Speaking of sets, in the definition of categories we were careful about not calling anything a *set*. This is because some categories involve collections of things that are too “large” to be called sets and not get into set theory trouble. Here are two more short definitions about this that we will need.

**Definition 2**. A category \mathbf{C} is called *small* if \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) is a set.

**Definition 3**. A category \mathbf{C} is called *locally small* if \mathop{\mathrm{\mathit{Arrows}}}_{\mathbf{C}}(A,B) is a set for every A, B \in \mathbf{C}.

For the rest of this note we will only deal with locally small categories, since in the setup for the lemma, we are given a category \mathbf{C} that is locally small.

Finally, one more notion that we’ll need later is the idea of an *isomorphism*.

**Definition 4**. An arrow f: X \to Y in a category \mathbf{C} is an *isomorphism* if there exists an arrow g: B \to A such that gf = 1_X and fg = 1_Y. We say that the objects X and Y are *isomorphic* to each other whenever there exists an isomorphism between them. If two objects in a category are isomorphic to each other we write X \cong Y.

Note that in the category {\mathbf {Sets}} the isomorphisms are exactly the invertible mappings between sets. An invertible mapping is also called a *bijection* (because it’s injective and surjective, you see), so you will see that word sometimes.

As we navigate our way from basic categories up to the statement of the lemma we will travel through multiple layers conceptual abstraction. At the base of this ladder are the categories which themselves are already an abstraction of the many ways that we express “mathematical structures”. But we have much higher to climb. Functors are the first step up.

Functors are the *arrows between categories*. That is, if you were to define a category where the objects were all categories of some kind then the arrows would be functors.

**Definition 5**. Given two categories \mathbf{C} and \mathbf{D} a *functor* F : \mathbf{C}\to \mathbf{D} is defined by two sets of parallel rules. First:

For each object X \in \mathbf{C} we assign an object F(X) \in \mathbf{D}.

For each arrow f: X \to Y in \mathbf{C} we assign an arrow F(f): F(X) \to F(Y) in \mathbf{D}.

So F maps objects in \mathbf{C} to objects in \mathbf{D} and also arrows in \mathbf{C} to arrows in \mathbf{D} such that the sources and targets match up the right way. That is, the source of F(f) is F applied to the source of f, and the target of F(f) is F applied to the target of f. In addition the following must be true:

If f:X \to Y and g: Y \to Z are arrows in \mathbf{C} then F(g \circ f) = F(g) \circ F(f) (or F(gf) = F(g)F(f)).

For every X \in \mathbf{C} it is the case that F(1_X) = 1_{F(X)}.

Thus, the mappings that make up a functor preserve all of the structure of the source category in its target, namely the sources and targets of arrows, composition, and the identities.

If F: \mathbf{C}\to \mathbf{D} is a functor from a category \mathbf{C} to another category \mathbf{D}, X \in \mathbf{C} is an object in \mathbf{C}, and f: X \to Y is an arrow in \mathbf{C} we may write F X to mean F(X) and Ff to mean F(f). This is analogous to the more compact notation for composition of arrows above.

Functors can be notationally confusing because we are using one name to denote two mappings. So if F: \mathbf{C}\to \mathbf{D} and X \in \mathbf{C} then F(X) is the functor applied to the object, which will be an object in \mathbf{D}. On the other hand, if f : A \to B is an arrow in \mathbf{C} then we also write F(f) \in \mathbf{D} for the functor applied to the arrow. This makes sense but can be a little weird. Sometimes in proofs and calculations the notations will shift back and forth without enough context and can be disorienting.

Natural transformations are the next step up the ladder. If functors are arrows between categories, then natural transformations are arrows between functors.

**Definition 6**. Let \mathbf{C} and \mathbf{D} be categories, and let F and G be functors \mathbf{C}\to \mathbf{D}. To define a *natural transformation* \alpha from F to G, we assign to each object X of \mathbf{C}, an arrow \alpha_X:FX\to GX in \mathbf{D}, called the *component* of \alpha at X.

In addition, for each arrow f:X\to Y of \mathbf{C}, the following diagram has to commute:

This is the first commutative diagram that I’ve tossed up. There is no magic here. The idea is that you get the same result no matter which way you travel through the diagram. So here \alpha_Y \circ F and G \circ \alpha_X must be equal.

We write natural transformations with double arrows, \alpha: F \Rightarrow G, to distinguish them in diagrams from functors (which are written with single arrows):

You might wonder to yourself: what makes natural transformations “natural”? The answer appears to be related to the fact that you can construct them from *only* what is given to you in the categories at hand. The natural transformation takes the action of F on \mathbf{C} and lines it up exactly with the action of G on \mathbf{C}. No other assumptions or conditions are needed. In this sense they define a relationship between functors that is just sitting there in the world no matter what, and thus “natural”. Another apt way of putting this is that natural transformations give a canonical way of moving between the images of two functors [3].

As with arrows, it will be useful to define what an isomorphism means in the context of natural transformations:

**Definition 7**. A *natural isomorphism* is a natural transformation \alpha: F \Rightarrow G in which every component \alpha_X is an isomorphism. In this case, the natural isomorphism may be depicted as \alpha: F \cong G.

In the last two sections we have defined functors, and then the natural transformations. Given that functors and natural transformations look a lot like objects and arrows, the next obvious thing is to use them to make a new kind of category.

**Definition 8**. Let \mathbf{C} and \mathbf{D} be categories. The *functor category* from \mathbf{C} to \mathbf{D} is constructed as follows:

The objects are functors F: \mathbf{C}\to \mathbf{D};

The arrows are natural transformations \alpha:F\Rightarrow G.

Right now you should be wondering to yourself: “wait, does this definition actually work?” I have brazenly claimed without any justification that the it’s OK to use the natural transformations as arrows. Luckily it’s fairly clear that this works out if you just do everything component-wise. So if we have all of these things:

Three functors, F: \mathbf{C}\to \mathbf{D} and G: \mathbf{C}\to \mathbf{D} and H:\mathbf{C}\to \mathbf{D}.

Two natural transformations \alpha: F \Rightarrow G and \beta: G \Rightarrow H

One object X \in \mathbf{C}.

Then you can define (\beta \circ \alpha)(X) = \beta(X) \circ \alpha(X) and you get the right behavior. Similarly, the identity transformation 1_F can be defined component-wise: (1_F)(X) = 1_{F(X)}.

There are a lot of standard notations for the functor category, none of which I really like. The most popular seems to be [\mathbf{C}, \mathbf{D}], but you also see \mathbf{D}^{\mathbf{C}}, and various abbreviations like \mathop{\mathit{Fun}}(\mathbf{C},\mathbf{D}) or \mathop{\mathit{Func}}(\mathbf{C},\mathbf{D}), or \mathop{\mathit{Funct}}(\mathbf{C},\mathbf{D}). I think we should just spell it out and use \mathop{\mathrm{\mathit{Functors}}}(\mathbf{C},\mathbf{D}). So there.

Now we can define this notation:

**Definition 9**. Let \mathbf{C} and \mathbf{D} be categories, and let F, G \in \mathop{\mathrm{\mathit{Functors}}}(\mathbf{C}, \mathbf{D}). Then we’ll write \mathop{\mathrm{\mathit{Natural}}}(F, G) for the set of all natural transformations from F to G, which in this context is the same as the arrows from F to G in the functor category.

You will also see people write \mathop{\mathrm{\mathit{Hom}}}(F, G), \mathop{\mathrm{\mathit{Hom}}}_{[\mathbf{C},\mathbf{D}]}(F,G), or [\mathbf{C},\mathbf{D}](F,G) for this. Or, if \mathbf{K} is a functor category then people will write \mathop{\mathrm{\mathit{Hom}}}_{\mathbf{K}}(F,G) or \mathbf{K}(F,G) for this.

The next conceptual step that we need is a way to relate *functors* to *objects*. The following definition is a natural way to do this once you see how it works but is also probably the most confusing definition in these notes.

**Definition 10**. Given a locally small category \mathbf{C} and an object X \in \mathbf{C} we define the functor
\mathop{\mathrm{\mathit{Arrows}}}(X,-) : \mathbf{C}\to {\mathbf {Sets}} using the following assignments:

A mapping from \mathbf{C}\to {\mathbf {Sets}} that assigns to each Y \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) the set \mathop{\mathrm{\mathit{Arrows}}}(X,Y)

A mapping from \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) \to \mathop{\mathrm{\mathit{Arrows}}}({\mathbf {Sets}}) that assigns to each arrow f: A \to B to a mapping f_* defined by f_*(g) = f\circ g for each arrow g: X \to A.

The notation \mathop{\mathrm{\mathit{Arrows}}}(X,-) needs a bit of explanation. Here the idea is that we have defined a mapping with two arguments, but then fixed the object X. Then we use the “-” symbol as a placeholder for the second argument. So \mathop{\mathrm{\mathit{Arrows}}}(X,Y) is the value of the mapping as we vary the second argument through all the other objects in \mathbf{C}. This is a bit of an abuse of notation since we are apparently using the symbol \mathop{\mathrm{\mathit{Arrows}}} to mean two different things (one is a set, the other a functor). Oh well.

The definition of the mapping for arrows also needs a bit of explanation. Given A,B \in \mathbf{C} and an arrow f: A \to B, it should be the case that \mathop{\mathrm{\mathit{Arrows}}}(X,-) applied to f is an arrow that maps \mathop{\mathrm{\mathit{Arrows}}}(X,A) \to \mathop{\mathrm{\mathit{Arrows}}}(X,B). We will call this arrow f_*. If g: X \to A is in \mathop{\mathrm{\mathit{Arrows}}}(X,A) then the value that we want for f_* at g is f_*(g) = (f \circ g): X \to B. This mapping is called the *post-composition* map of f since we apply f *after* g. You also see it written as f \circ -. The *pre-composition* map is then f^* or - \circ f.

Thus, we have worked out that the value of \mathop{\mathrm{\mathit{Arrows}}}(X,-) at f should be the arrow f \circ -. Sometimes you will see this written \mathop{\mathrm{\mathit{Arrows}}}(X, f) = f \circ -, which I find a bit odd because now we are overloading the kinds of things that can go into the “-” slot.

Check over this formula in your head, and note that there are *two* function applications (one for the functor, and one inside that for the post-composition arrow), and two different kinds of placeholder.

Other notations for this functor include \mathop{\mathrm{\mathit{Hom}}}(X, -), \mathop{\mathrm{\mathit{Hom}}}_\mathbf{C}(X, -), H^X, h^X, and just plain \mathbf{C}(X,-). In my notation we should have written this as \mathop{\mathrm{\mathit{Arrows}}}_{\mathbf{C}}(X, -), but I’m lazy. This kind of functor is also called a *hom-functor*.

Finally, we can give two more important definitions.

**Definition 11**. Given an object X \in \mathbf{C} we call the functor \mathop{\mathrm{\mathit{Arrows}}}(X,-) defined above the functor *represented* by X.

In addition, we can characterize another important relationship between objects and functors:

**Definition 12**. Let \mathbf{C} be a category. A functor F:\mathbf{C}\to{\mathbf {Sets}} is called *representable* if it is naturally isomorphic to the functor \mathop{\mathrm{\mathit{Arrows}}}_\mathbf{C}(X,-):\mathbf{C}\to{\mathbf {Sets}} for some object X of \mathbf{C}. In that case we call X the *representing object*.

Next we move a bit sideways. Duality in mathematics comes up in a lot of different ways. Covering it all is way beyond the scope of these notes. But the following definition is a basic part of category theory so it’s worth including.

**Definition 13**. Let \mathbf{C} be a category. Then we write \mathbf{C}^{\mathrm op} for the *opposite* or *dual* category of \mathbf{C}, and define it as follows:

The objects of \mathbf{C}^{\mathrm op} are the same as the objects of \mathbf{C}.

\mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}^{\mathrm op}) is defined by taking each arrow f :X \to Y in \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) and flipping their direction, so we put f': Y \to X into \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}^{\mathrm op}).

In particular for X, Y \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) we have \mathop{\mathrm{\mathit{Arrows}}}_{\mathbf{C}}(A, B) = \mathop{\mathrm{\mathit{Arrows}}}_{\mathbf{C}^{\mathrm op}}(B, A) (or \mathbf{C}(A, B) = \mathbf{C}^{\mathrm op}(B, A).

Composition of arrows is the same, but with the arguments reversed.

The *principle of duality* then says, informally, that every categorical definition, theorem and proof has a dual, obtained by reversing all the arrows.

Duality also applies to functors.

**Definition 14**. Given categories \mathbf{C} and \mathbf{D} a *contravariant* functor from \mathbf{C} to \mathbf{D} is a functor F: \mathbf{C}^{\mathrm op}\to \mathbf{D} where:

We have an object F(X) \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{D}) for each X \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}).

For each arrow f : X \to Y \in \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) we have an arrow F(f): FY \to FX in \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{D}).

In addition

For any two arrows f, g \in \mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) where g \circ f is defined we have F(f) \circ F(g) = F(g \circ f).

For each X \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}) we have 1_{F(X)} = F(1_X)

Note how the arrows and composition go backwards when they need to. With this terminology in mind, we call regular functors from \mathbf{C}\to \mathbf{D} *covariant*.

Now we have all the language we need to look at the statement of the lemma again. So, here is what we wrote down before, more verbosely, and in my notation.

**Lemma 1** (Yoneda). Let \mathbf{C} be a locally small category, F:\mathbf{C}\to {\mathbf {Sets}} a functor, and X \in \mathop{\mathrm{\mathit{Objects}}}(\mathbf{C}). We can define a mapping from \mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(X, -),F) \to FX by assigning each transformation \alpha: \mathop{\mathrm{\mathit{Arrows}}}(X, -) \Rightarrow F the value \alpha_X(1_X) \in FX. This mapping is invertible and is natural in both F and X.

So now we can break it down:

In principle the natural transformations from \mathop{\mathrm{\mathit{Arrows}}}(X, -) \Rightarrow F could be a giant complicated thing.

But actually it can only be as large as FX. The fact that this mapping is invertible implies that \mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(X, -),F) and FX are isomorphic (that is, \mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(X, -),F) \cong FX).

In other words, every natural transformation from \mathop{\mathrm{\mathit{Arrows}}}(X, -) to F is the same as an element of the set FX. In particular, all we need to know is how \alpha_X(1_X) is defined to know how any of the natural transformations are defined.

Which is pretty amazing.

To write this in the dual language, you just change \mathop{\mathrm{\mathit{Arrows}}}(X, -) to \mathop{\mathrm{\mathit{Arrows}}}(-, X), which switches the direction of all the arrows and the order of composition in the composition maps.

So with that, here are some other ways people write the result, and how their lingo translates to my notational scheme. As one last bit of terminology, in some of the definitions below the word *bijection* is used to mean an invertible mapping.

This statement is due to Tom Leinster 5], and uses the contravariant language.

**Lemma 2** (Yoneda). Let \mathbf{C} be a locally small category. Then [\mathbf{C}^\mathrm{op},{\mathbf {Sets}}](H_X, F)
\cong
F(X)
naturally in X \in \mathbf{C} and F \in [\mathbf{C}^\mathrm{op},{\mathbf {Sets}}].

Here [\mathbf{C}^{\mathrm op}, {\mathbf {Sets}}] is the category of functors from \mathbf{C}^{\mathrm op} to {\mathbf {Sets}} and H_X means \mathop{\mathrm{\mathit{Arrows}}}(-,X). The notation [\mathbf{C}^\mathrm{op},{\mathbf {Sets}}](H_X, F) denotes the arrows in the functor category [\mathbf{C}^\mathrm{op},{\mathbf {Sets}}] between H_X and F, so it’s the same as \mathop{\mathrm{\mathit{Natural}}}(H_X, F).

Emily Riehl’s [8] version is what I used at the top:

**Lemma 3** (Yoneda). Let \mathbf{C} be a locally small category and X \in \mathbf{C}. Then for any functor F : \mathbf{C}\to {\mathbf {Sets}} there is a bijection
\mathop{\mathrm{\mathit{Hom}}}(\mathbf{C}(X,-), F) \cong FX
that associates each natural transformation \alpha:\mathbf{C}(X,-) \Rightarrow F with the element \alpha_X(1_X) \in FX. Moreover, this correspondence is natural in both X and F.

Here \mathop{\mathrm{\mathit{Hom}}}(\mathbf{C}(X,-), F) means \mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(X,-), F). I think this is my favorite “standard” way of writing this.

Peter Smith [11] does this:

**Lemma 4** (Yoneda). For any locally small category \mathbf{C}, object X \in \mathbf{C}, and functor F:\mathbf{C}\to {\mathbf {Sets}} we have \mathop{\mathit{Nat}}(\mathbf{C}(X,-),F) \cong FX both naturally in X \in \mathbf{C} and F \in [\mathbf{C}, {\mathbf {Sets}}].

He uses the [\mathbf{C}, {\mathbf {Sets}}] notation for the functor category, and \mathop{\mathit{Nat}} where we use \mathop{\mathrm{\mathit{Natural}}}.

Paolo Perrone [8] writes the dual version, and uses the standard term "presheaf" to mean a functor from \mathbf{C}^{\mathrm op} to {\mathbf {Sets}}.

**Lemma 5** (Yoneda). Let \mathbf{C} be a category, let X be an object of \mathbf{C}, and let F:\mathbf{C}^\mathrm{op}\to{\mathbf {Sets}} be a presheaf on \mathbf{C}. Consider the map from \mathop{\mathrm{\mathit{Hom}}}_{[\mathbf{C}^\mathrm{op},{\mathbf {Sets}}]} \bigl(\mathop{\mathrm{\mathit{Hom}}}_\mathbf{C} (-,X) , F \bigr) \to FX assigning to a natural transformation \alpha:\mathop{\mathrm{\mathit{Hom}}}_\mathbf{C} (-,X)\Rightarrow F the element \alpha_X(\mathrm{id}_X)\in FX, which is the value of the component \alpha_X of \alpha on the identity at X.

This assignment is a bijection, and it is natural both in X and in F.

Here he writes \mathop{\mathrm{\mathit{Hom}}}_\mathbf{C} for \mathop{\mathrm{\mathit{Arrows}}}_\mathbf{C} and \mathop{\mathrm{\mathit{Hom}}}_{[\mathbf{C}^\mathrm{op},{\mathbf {Sets}}]} to mean the arrows in the functor category [\mathbf{C}^\mathrm{op},{\mathbf {Sets}}], which are the natural transformations.

Finally, Peter Johnstone [4] has my favorite, relatively concrete statement:

**Lemma 6** (Yoneda). Let \mathbf{C} be a locally small category, let X be an object of \mathbf{C} and let F:\mathbf{C}\to {\mathbf {Sets}} be a functor. Then

(i) there is a bijection between natural transformations \mathbf{C}(X, -) \Rightarrow F

(ii) the bijection in (i) is natural in both F and X.

Now your reward for having climbed all the way up this abstraction ladder with me is yet another abstraction!

Suppose you are given an object Y and you apply the Yoneda lemma by substituting \mathop{\mathrm{\mathit{Arrows}}}(Y,-) for the functor F. Then \mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(X, -),\mathop{\mathrm{\mathit{Arrows}}}(Y,-)) \cong\mathop{\mathrm{\mathit{Arrows}}}(Y,-)(X) = \mathop{\mathrm{\mathit{Arrows}}}(Y,X) Note the order of the arguments! We can also write: \mathop{\mathrm{\mathit{Arrows}}}(X,Y) \cong\mathop{\mathrm{\mathit{Natural}}}(\mathop{\mathrm{\mathit{Arrows}}}(-, X),\mathop{\mathrm{\mathit{Arrows}}}(-,Y)) Each of the functors \mathop{\mathrm{\mathit{Arrows}}}(-, X) maps from \mathbf{C}^\mathrm{op}\to {\mathbf {Sets}} because that’s how we defined the represented functors. So now let’s jump up one more level of abstraction. We define a functor that maps objects to the functors that they represent, and arrows to the natural transformations between those functors. Given an object Y\in\mathbf{C} define the functor \mathop{Y\!o}:\mathbf{C}\to \mathop{\mathrm{\mathit{Functors}}}(\mathbf{C}^\mathrm{op}, {\mathbf {Sets}}) by \mathop{Y\!o}(Y) = \mathop{\mathrm{\mathit{Arrows}}}(-, Y) : \mathbf{C}^\mathrm{op}\to {\mathbf {Sets}} and given an arrow f: A \to B with A,B \in \mathbf{C} define \mathop{Y\!o}(f) = f_* = (f \circ -) : \mathop{\mathrm{\mathit{Arrows}}}(-,A) \Rightarrow\mathop{\mathrm{\mathit{Arrows}}}(-,B) This definition has the same “shape” as the one for represented functors, but we have abstracted over all the objects and arrows. Also note that we could have also defined this as \mathop{Y\!o}:\mathbf{C}^\mathrm{op}\to \mathop{\mathrm{\mathit{Functors}}}(\mathbf{C}, {\mathbf {Sets}}) using duality. All that changes is the order of the arguments in the functors.

The Yoneda lemma can now be used to prove that these mappings are invertible, so \mathop{Y\!o} is what is called an *embedding* of the category \mathbf{C} inside the functor category \mathop{\mathrm{\mathit{Functors}}}(\mathbf{C}^\mathrm{op}, {\mathbf {Sets}}). Thus \mathop{Y\!o} is called the *Yoneda embedding*, and you can read about the rest of the details in the references.

This construction tells us why people say things like, “Every object in a category can be understood by understanding the maps into (or out of) it.” This statement can be made precise:

**Corollary 7**. Let \mathbf{C}, X, and Y be given as above.

X and Y are isomorphic if and only if for every object A \in \mathbf{C}, the sets \mathop{\mathrm{\mathit{Arrows}}}(X, A) and \mathop{\mathrm{\mathit{Arrows}}}(Y, A) are naturally isomorphic.

X and Y are isomorphic if and only if the functors that they represent are naturally isomorphic. In particular, if X and Y represent the same functor then they must be isomorphic.

To close, a few final thoughts, and no more abstraction.

First, the modern internet is something of an endless treasure trove for the amateur category theory nerd. I have listed my favorite references at the end of this note, and it’s amazing that you can download almost them all for free, and sometimes with source code! When trying to understand something that is as deep an abstraction stack as this result it is very useful to be able to look at it from many different points of view. So, I am grateful for all of the sources.

Second, I wish I could have thought of a better notation for the represented functor than \mathop{\mathrm{\mathit{Arrows}}}(X,-) with all that placeholder nonsense. I don’t like how the placeholders can stand in for anything you want and how their meaning can shift and change in different contexts. But, even with those problems it’s better than hiding the definition behind yet another layer of naming (e.g. H_X), which is the only other obvious choice.

Third, you might have found my use of \mathop{Y\!o} for the Yoneda embedding to be frivolous, and perhaps childish. And I would have agreed. But then I read in multiple sources that the Yoneda embedding is sometimes denoted by よ, the hiragana kana for “Yo”.

Given this, how could I resist?

Finally, I need to shout out the excellent tutorial video by Emily Riehl that demonstrates how this result works the specific category of matrices [10]. The whole Yoneda picture suddenly became more clear while I was watching this talk the second time. Her book, *Category Theory in Context*, is also excellent [9]. Recommended.

\mathbf{C}, \mathbf{C}^\mathrm{op} - Categories and opposite categoies.

\mathop{\mathrm{\mathit{Objects}(\mathbf{C})}} - Objects in a category category \mathbf{C}. Often just written \mathbf{C}.

\mathop{\mathrm{\mathit{Arrows}}}(\mathbf{C}) - Arrows in a category.

\mathop{\mathrm{\mathit{Arrows}}}_\mathbf{C}(X,Y) - Arrows between two objects. Also written \mathop{\mathrm{\mathit{Arrows}}}(X,Y) or \mathop{\mathrm{\mathit{Hom}}}(X,Y) or \mathop{\mathrm{\mathit{Hom}}}_\mathbf{C}(A, B) or just \mathbf{C}(X,Y).

f: X \to Y - An arrow from X to Y.

g \circ f, gf - Composition of arrows.

X \cong Y - Isomorphism.

F:\mathbf{C}\to\mathbf{D} - A functor from \mathbf{C} to \mathbf{D}.

\alpha: F \Rightarrow G - Natural transformation.

\mathop{\mathrm{\mathit{Functors}}}(\mathbf{C}, \mathbf{D}) - Functor category between \mathbf{C} and \mathbf{D}. Also [\mathbf{C},\mathbf{D}] or \mathbf{D}^\mathbf{C}.

\mathop{\mathrm{\mathit{Natural}}}(F, G) - The collection of natural transformations from F to G. Also written [\mathbf{C},\mathbf{D}](F,G), or \mathop{\mathit {Nat}}(F,G) or just \mathop{\mathrm{\mathit{Hom}}}(F,G).

\mathop{\mathrm{\mathit{Arrows}}}(X, -) - The represented or “arrow” functor for X. Also called the “hom” functor and written \mathbf{C}(X,-), H^X, \mathop{\mathit{hom}}, or \mathop{\mathrm{\mathit{Hom}}}(X,-).

f \circ -, - \circ f - Pre- and post-composition maps. Also written f_* and f^*.

\mathop{Y\!o} - Yoneda Embedding.

[1] Sean Carroll, *A No-Nonsense Introduction to General Relativity*, 2001.

[2] Eugenia Cheng, *The Joy Of Abstraction*, 2022.

[3] Julia Goedecke, *Category Theory Notes*, 2013.

[4] Peter Johnstone, *Category Theory*, notes written by David Mehrle, 2015.

[5] Tom Leinster, *Basic Category Theory*, 2016.

[6] LobosJr, *Dark Souls 1 Speedrun, Personal Best*, 2013.

[7] Saunders Mac Lane, *Categories for the Working Mathematician*, Second Edition, Springer, 1978.

[8] Paolo Perrone, *Notes on Category Theory with examples from basic mathematics*.

[9] Emily Riehl, *Category Theory in Context*, Dover, 2016.

[10] Emily Riehl, ACT 2020 Tutorial: *The Yoneda lemma in the category of matrices*.

[11] Peter Smith, *Category Theory: A Gentle Introduction*, 2019.

This one is so easy it’s almost cheating. This scheme is based on a recipe I have stolen from Marcella Hazan. Buy her book, it’s in there. But I’ve adjusted the flow a bit to make it easier to follow. For me.

First get out your pasta cooking pot. Fill it with 3-4 quarts of water and start it heating. Toss a few big pinches of salt into the water. You don’t want *too* much water because you want the noodles to be able to make the water starchy while they cook, so you can use the starchy water later. You can do the rest of the recipe while the water gets up to boiling and while the pasta cooks. This recipe is calibrated to about a pound of pasta, give or take.

Now chop up three or four cloves of garlic and 3 or 4 slices of bacon (or more, go for it). Put these into a medium sizes saute pan with some oil and cook it until the bacon is crispy and frying in its own fat for a while. When it’s done deglaze the pan with white wine and reduce for a few minutes, then turn the heat off.

At this point your pasta water should be ready for you to cook the spaghetti, so drop it in.

While the pasta cooks, crack three eggs into a bowl and beat them. Add salt and pepper to taste. This will not take that long, so when you are done grate about a cup or a bit more of a mix of Romano and Parmesan cheese. The dark secret here is that Romano is better for this dish, and if you want to you can just not use any Parmesan at all.

When the pasta is almost done, turn the heat on under the bacon for a bit to warm it up.

When the pasta is actually done take out one scoop of the water. Then drain the rest and dump the noodles back into the pot. Add the bacon and garlic and all the oil in that pan and mix it around. Then add the egg and 3/4ths of the cheese and mix it around. If it’s too saucy, add cheese. If it’s too dry, add a bit of pasta water. When that is done, garnish with some fresh green herb if you want, but I never seem to do this anymore. Then add as much black pepper as you can stand to grind in without your hands cramping up.

Now you can gorge yourself.

Oh yeah, if you are nervous about raw eggs, be careful where you buy your eggs. Or you could always wuss out and get pasteurized eggs.

I love two things about this dish.

You never actually have to cook the sauce, per se. You just have to mix it together.

It’s really a bacon and eggs breakfast on top of pasta, with cheese. How brilliant is that?

Finally, if you are more efficient, you can probably do this with only one pot and the saute pan. But this flow is a bit easier.

**Notes from 2023**:

Yes I know I should be using guanciale. The stuff is hard to find locally though.

Also apparently I should not be using garlic. I’m surprised that Hazan would have a controversial stance on this. But there you go.

Long time readers will be familiar with my general annoyance at a lot of Western writing about how to cook Chinese food. With relatively few exceptions I tend to think that people make this food out to be more complicated than it really is. I also tend to think that people, even Chinese, or Chinese-American people tend to attribute an almost mystical aspect to the technique needed to cook good Chinese food in a home kitchen.

Nothing sums up both of these annoyances as well as the almost ubiquitous obsession with *wok hei* (Cantonese) or *guo qi* (Mandarin) which literally translates as “pan gas” but which has been more poetically rendered as “wok breath” or “breath of the wok” in the West. The idea is that when you cook food in a really hot wok, over a really high flame, the resulting food will have a particular smokey/grilled/singed flavor profile generated by the heat and the flame acting on the oil in the pan. This is especially strong when the food is straight out of the pan and on to a plate in your favorite Chinese takeout joint.

This is all fine. I have no doubt that the food tastes this way, and that some combination of the pan and the technique of the chef makes it taste this way. While before you would have just had to take some writer’s word for it, these days you can just go to youtube and watch Chef Wang set his food on fire over that jet engine wok burner and you know what’s up. His commentary will even explicitly mention getting the “锅气” out of the food. More recently his English subtitles have started translating this as “wok hei”, since whoever is writing them knows that’s a phrase people know. But earlier videos that are more automatically subtitled say “pan gas”, which amuses me.

So what is my problem? My problem is that people (or at least cookbook publishers) seem to think that achieving this flavor profile is the be all and end all of cooking Chinese Food at home. Find almost any Western, and especially English language writing on Chinese cooking and some mention of this technique will show up early and often and questions of high heat burners and hot pans tend to sit front and center in any discussion about Chinese cooking.

This attitude always confused me, and I have come to realize (from Kenji’s comments at the end of in this video) that this might be because I grew up eating Chinese food at home. My mom cooked everything in a cast iron skillet on a *coiled electric stove*. It was better than the food in the restaurants. But, most of the food writing that annoys me is from people chasing *restaurant style* Chinese food and trying to figure out how to do *that* at home.

Now, I don’t want to begrudge people their irrational obsessions. God knows this web site is nothing if not a catalog of mine. But there is *so much more* to Chinese food than just stir fry technique.

Chinese cooking encompasses every possible technique in the kitchen. There is stir fry (which is really just saute, don’t @ me), roasting, braising, deep frying, smoking, bread making, pancakes, dumplings (the best dumplings), noodles, soups, hot pot, pickles, and anything else you can think of. Yet over and over again people just go on and on about stir fry. Scroll around in Chef Wang’s videos, see how many of them involve setting the wok on fire. The number is actually pretty small. If you go to many of the other huge number of Chinese language cooking channels on youtube you see the same thing, even when the cooks used to work in restaurants.

I guess it’s no coincidence that almost none of my favorite Chinese food is restaurant style high heat stir fry. Steamed fish is steamed. Congee is over-boiled rice. Dumplings are boiled or steamed. Ma-po tofu is a medium to low heat stew. Three cup chicken is almost stir fried, but it’s really a braise. Good hot pan technique is probably most critical with the green vegetables … but to me getting that right is not that different from getting the same thing right at a saute station in France.

On the other hand, my annoyance is not completely fair. Even the book that I usually credit with starting the whole *wok hei* obsession, Grace Young’s *The Breath of the Wok*, covers the many other aspects of Chinese food that I have mentioned, though that coverage is later in the book after all of the semi-mystical wok material. So maybe this whole thing just circles back to *my* irrational obsession with my own probably unfair perceptions. Still, by far the most extensive recent piece about Chinese food in the newspaper of record is Kenji’s fifteen hundred words about wok hei at home, so maybe not.

In either case, I am here to say my piece about *wok hei* and then shut up, and my piece has two parts:

Just stop it. Go learn to make other things. It’s just not that important. You can make Chinese food on any stove you have and almost any pan you have. Stop overthinking it.

I take no credit for thought number 1, except that it lived in my annoyed brain for years. All the credit for expressing the thought goes to the heroes at Chinese Cooking Demystified for making all that great food on a little hot plate and sharing my confusion over the

*wok hei*obsession. Take their advice about wok technique and listen to their truth about what you need in a wok.

Now I’m going to go back to my kitchen, and my favorite 12 inch non-stick aluminum wok shaped pan, and make dinner. This pan, by the way, is great for *everything*. It’s not too small. It’s not too big and heavy. The coating can take a beating and it still works fine. And the flat bottom lets it double as a small frying pan for things like French omelettes and frying hot dogs. I keep a 14 inch steel wok around for deep frying but it’s just too big and unwieldy for day to day, and because I don’t use it enough the surface is never any good. Maybe I’ll try to find a nice iron one to set on fire over the grill.

The computer programming industry is one that is full of apparent contradictions. On the one hand, over the last forty years or so we have seen consumer computing evolve from a hobby that engaged a few tens of thousands of people into an industry that has almost literally put a stupid computer into every single aspect of our lives. And often that stupid computer is hooked up to a giant world wide network of other stupid computers, amplifying the stupid exponentially. The evolution has been so drastic that if you could travel back in time and tell any computer nerd in the 80s what a pocket computer would look like in the 2020s you’d get put in a hospital for the mentally ill.

On the other hand, when you abstract away enough of the details, computers these days are almost exactly the same as they were in the 80s. Just more. You have a CPU, some memory, some mass storage, a network connection, and a pretty cool graphical user interface for programming the thing.

Tools for programming the dumb machines have also retained a very similar structure. There are a collection of compilers that translate code in higher level languages to binary machine code. There is a build system to mash all the machine code together in the right order. There is a debugger that never quite works.

To the pessimistic soul it might seem as if nothing has changed and that we have created very little new in the last forty years. Being a pessimistic soul I have some sympathy for this point of view. In fact, earlier this year I even went through an exercise for an article I never wrote where I cataloged a few dozen ideas that I called “cults” because of the unbridled enthusiasm with which they were sold into the technology industry by their adherents. The pessimist in me could not help but snark at the optimists from the past:

Structured programming: If you write programs in small blocks where GOTO is not allowed, everything will be much better.

LISP: Enough said.

The Object cults: OO Languages, databases, patterns, OO Design, etc: Trying to bring the joy of LISP in the 70s and 80s to the corporate programming salt mines of the 90s.

AI: This has risen and died multiple times. The current moar brute force gradient descent craze is the latest cycle.

The Free Software Cults: Turns out these folks were mostly a bunch of jerks.

Client-Server/OORPC/Middleware: This is how computers on networks talked to each other before HTTP.

Web, Web 2.0, Web Mobile: The Internet is the best place to deploy software. Oh except it’s actually the worst.

Process cults, most recently Agile: If you just follow this simple state machine of rituals software becomes easy. Well … no.

The Type Theory Cult: I have talked about this before. Very useful for certain things. No silver bullet.

The Silicon Valley Bro-Nerd Myth: Turns out these guys were mostly jerks too.

And on and on.

I am of course being maximally unfair. All of the cults above actually embody the kernel of a good idea. They are also all infected with the one true nerd idea that ruins everything: *If it works well for me, it should generalize and work well for you too*. But, this is in fact never true. Or rather it’s only true if you decide that you can ignore the details of each unique situation when you go in and start bashing your solution hammer. In software this is the one thing you cannot do, because the whole world is in the details. As we all know from reading our Fred Brooks, the complexity of software is *essential*, not incidental. You can’t just abstract them away without losing the ability to solve part of the problem.

I never managed to write the full article about all the technology cults that I have tried to ignore over the years because in the end I realized that aside from some small bits of pithy snark I had nothing else interesting to say about them. All that you can say is that they all missed the point about not ignoring essential details, which is why they don’t really work out, and move on with your life.

I think the flip side of this story is more interesting though. It is easy and lazy to decry the software industry as ultimately shallow and empty because at a high level it seems like the tools we are using now are the same as we were using in the 80s (and maybe even the 70s). It’s easy to read, say, the Xerox Smalltalk papers, or the Lisp Machine papers, or the Xerox (again!) Cedar papers ^{1}, and wonder about what grand new tools could have evolved from *that* basis instead of where we ended up. If you do this with enough gusto, and ignore enough of the details, you can only conclude that we must have missed some huge boat if we are still reading core dumps, or splunk logs, or single stepping through code in the terminal, or whatever we do these days.

But, I think if you dig into the details of the tools that are available these days, instead of just staring that their abstracted exterior structure, you have to conclude that we have made a lot of progress even if it is “only” linear.

I have a few recent examples of people working in the details to make things better:

Pandoc - This is a truly useful tool and arguably the most useful thing to come out of the whole Haskell universe. Among other things this engine generates the HTML that you are reading now with relatively little fuss. And, it’s flexible enough to let me put in some of my own bibs and bobs in a way that isn’t too much more difficult than in, say, Python. While you might say that file format conversion is not exactly

*new technology*this particular packaging of the idea along with the tricks that you can get from higher order functional nature of Haskell makes the tool much more useful than it otherwise might be.Pernosco - Debuggers suck. They make you reason about what your giant system is doing by staring at them through a tiny peephole at particular moments in time, and then making a lot of guesses. Pernosco captures everything about an entire execution of your program and lets you look at all of it at once. This is not just a tool that lets you debug forward and backward in time. It’s more like a tool that lets you see all of time at once, and query it like a database.

Github static analysis - A buddy of mine just deployed a giant system at github that lets you do semantic navigation of code in the github browser interface. That’s scaling the compiler up to gigantic proportions. Read about it here.

Cell phone cameras - This is not a programming tool, but has to be listed here. I don’t know what deal these people made with the devil, but the laughing in the face of physics in the name of better image capture marches on. These cameras were impossible five years ago. They are even more so now.

If I can find these new and pretty cool things off the top of my head in 5 minutes I’m pretty sure many more exist. No, they are not some exponentially powerful hammer that will do away with the need to really understand things and do good work writing code. But that was never going to happen anyway.^{2}

The whiners will always find some system from 30 years ago that has the same shape as whatever you show them but doesn’t really solve the same problem, or doesn’t solve it as well, or doesn’t solve it for as many people. They are missing the whole picture, which is what makes them whiners.

It turns out that while the pessimists and the optimists disagree with one another they are both making the same mistake when forming their opposing views: they are ignoring the details when thinking about software.

It’s hard to read some of the programming environment literature from the 80s, especially from Xerox PARC, and not be a bit wistful when realizing that all of that stuff existed, and it’s the UNIX model that won.↩︎

I know I get tedious about this, but Brooks really was right about

*No Silver Bullet*.↩︎

I started making this a couple of years ago off of the recipe in this youtube video by Steph and Chris of “Chinese Cooking Demystified” fame. They also have their recipe written out on reddit. The Fucshia Dunlop recipe is also a classic.

The main problem with using someone else’s recipe for this dish is tuning the balance of the spices. You want just the right amount of heat, chili flavor and Sichuan peppercorn tingles in your tongue. Get any of these factors out of balance with the rest and the whole thing is ruined.

So here is what I do, which gives me the spice balance I want.

1/4 lb ground meat. Any kind will do, but ground chicken or turkey will be bad.

1lb of soft tofu, cubed. In the recipe above they soak the tofu in hot water for a while. I don’t do this.

1 tablespoon Sichuan Chili Bean Paste, i.e. Pixian Doubanjiang (郫县豆瓣酱). Your local Asian store will have this. Look for a package that looks like this.

1 tablespoon ground chili flake. Your local store should have this too. If you want to splurge on something fancy this stuff is great.

2-3 teaspoons of Sichuan Peppercorn. You can use red, or you can use a mix of red and green. Again, your local store will have this. Again, the stuff at the Mala market is great.

1 tablespoon fermented black beans, chopped.

3 or 4 cloves of garlic, minced

An inch cube or so of ginger, minced.

1-2 cups of chicken stock or water. It really does not matter which you use.

One bunch of scallion with the white part in one bowl and the green part in another bowl.

A bit of light soy, a bit of dark soy, Shaoxing wine if you want to get fancy.

Corn starch slurry (2-3 tablespoons of corn starch mixed with twice as much water).

First toast the peppercorns lightly on medium heat, and then grind them up in a mortar and pestle (or a coffee mill that you use only for spices). Put the result in a bowl. Put the chili flake in another bowl.

Next, chop up the black beans, put in a bowl.

Next, chop up the chili bean paste to break up the large pieces of beans, put in a bowl.

Put the rest of your aromatics in a bowl.

Put your pan (any medium sized sauce pan or soup pot will work, or a wok) on medium to medium high heat. Add oil and brown the meat, chopping it up into little pieces. When it’s pretty brown and has rendered out, turn the heat down to low and pull the pan off the heat to cool off for 30 seconds. Add the chili bean paste. Put the pan back on the heat and carefully warm up the chili bean paste until it lets out its oil and aromatics. You do not want to burn it. This will take a minute or two. Mix the red oil into the meat.

Now add your ginger and garlic and the white part of the scallion, chili flake, and black beans. Mix it around. Add about half of the ground sichuan peppercorn at this point. At this point you can turn the heat back up to medium to medium high.

Add a cup or so of water, then the tofu, then add the soy and the wine and more water until you get a good balance of liquid to tofu depending on how saucy you want the final result. Mix this around to make sure the balance is right and wait for it to come back to a slow boil. I like to be able to see the top half of the tofu cubes sticking out of the water. You will get a feel for this.

When the liquid in the pan is back to a soft boil, add a third of the corn starch. Stir it around to how much the sauce thickens. If it’s still soupy, add a bit more, stir it around. Repeat until you get the consistency you want. Again, you’ll get a feel for this.

Here’s what my final result looks like in the pot at the end:

And in a bowl

And in a bowl with rice

To finish up taste a spoonful to check the balance of heat to numbing flavor. Add a pinch or two more of the sichuan peppercorn if you want more of that kick. Or a pinch or two more chili flake if you want more heat.

Finally, put the green part of the scallion on top for garnish.

Serve over white rice. Nothing else. If I catch you eating this with some brown rice/quinoa medley shit I’ll come for you.

If for some reason you want to make vegan friends you can use dried shitake mushrooms that you soak and then chop up instead of the ground beef. They will never know what hit them. To unmake the new friends you can then tell them you actually put beef in it after they eat the whole pot.

Finally, you will notice that my recipe, the Dunlop recipe, and the Steph and Chris recipe vary a lot in how much of the spicy stuff they use. I’m not sure why this is, but my amounts have ended up working best for me over time. Again, after you do this a few times you’ll get a feel for what you like. Luckily the end to end cooking time on this dish is only about 15-20min. So you can experiment very quickly. Have fun.

Back in the late 90s I bought a pair of books called *Darkroom* and *Darkroom 2* originally published in the late 70s that interviewed a couple of dozen “fine art” photographers about the nature of their work in black and white darkrooms. In retrospect the 70s and 80s were probably a peak, of sorts, for fine art photographers who worked in black and white darkrooms. This peak held on through to the 90s, with enthusiastic enthusiasts still willing to gobble up behind the scenes “how to” information about building a perfect room in your house to sit alone in the dark to make pictures. This enthusiasm started to wane by the end of the 90s and then was completely destroyed by the consumer digital SLR in the early 2000s and now you never hear about people’s darkrooms anymore except when they are selling the pieces on ebay.

This timing is an interesting personal coincidence for me because my personal digital picture archive stretches back to exactly around 2002 when I started using a shitty digital point and shoot for kid pictures while still shooting black and white film with my “real cameras”. It is notable that I still have most of the digital pictures, but never look at the film ones at all.

My working method with these pictures was never really that interesting. Until about 2007 I used an ad-hoc and cobbled together set of tools to store a year or two of full sized pictures on my laptop and work with reduced size pictures on my web site. Older pictures were offloaded to an iMac with a larger disk and various other backup media. I stuck with this year at a time kind of scheme until around 2015 when laptop disks started to get big enough to store more pictures at once on the laptop.

After 2007 I started using Lightroom as a single tool that did all three previous jobs. Since then I have moved from Lightroom to Lightroom and never really thought about it. I even stuck with the tool after Adobe committed the unforgivable sin of switching to a more sustainable subscription-based business model in the 2010s. For better or worse Lightroom had my data in it, and I didn’t want to move it.

Of course, the Lightroom landscape has been a bit confusing in the past few years. Let me summarize…

First, the subscription Lightroom became “Lightroom CC” … for Creative Cloud (barf).

Then Adobe released a simplified and mobile-syncing version of Lightroom that is completely different from the old Lightroom. And confusingly called

*that*Lightroom CC and called the old one Lightroom “Classic” or “Classic CC” (I think, I can’t actually remember). I think this convention bounced back and forth multiple times.Since that was confusing, later they changed it up again and called the new Lightroom CC just “Lightroom” and the old one just “Lightroom Classic”. So it all ended up in a good place.

I watched all this with detached amusement and ignored it because this new Cloud (Fog) tool seemed half baked and using a far away server as the way to get pictures from one device to another device when both devices are in my house always seemed dumb.

So of course in around 2019 I started to noodle with using the new Lightroom as my main tool because using a far away server to get pictures from my phone to my laptop even though they are both in my house right next to each other is actually easier than any other possible solution … sigh. I waited 10 years for something better and it never came, so I gave up.

I ran the catalogs for my 2019 pictures in parallel for a while until it was clear this new Lightroom did most of the things that I needed it to do. Digital cameras are so good these days that even a photo dork’s needs for elaborate post processing are pretty limited now. My main gripes about the new tool are that the navigation through the user interface is weird, they moved a few key commands around, and there is no clear way to make a *local* backup of a catalog (more on this later), although backing up the photos so you can remake a catalog (except for albums) later is easy enough.

After a couple of years with the new tool, and a new laptop where I could store all of the picture files in one place, I also finally got started on a project that I had been putting off for ten years or so: rebuilding the old reduced sized picture albums I had made in the early 2000s before disk space was infinite. This is where you really find out how different the new and old tools are.

My old catalogs are split into three groups: one catalog covers all the old pictures that I did not split into single years between around 2001 and 2004. Then there is one LR catalog per year until 2015. And then 2015 to 2018 are all in one catalog.

Old Lightroom to new Lightroom migration is mostly straightforward except for the following hiccups:

In old Lightroom catalogs stored only meta-data and album/file structure, not the photo files themselves. This means that various bits of auxiliary data could be stored both in the LR catalog and in the file system so they would be constantly out of sync. The migration engine is picky about this, so you are in for some tedium in trying to get rid of these conflicts before moving things.

You can only migrate a given catalog once. So don’t fuck up.

Sometimes the migration engine will tell you that it migrated 0 pictures, when it actually copied all of them. I’m not sure what this means, but it’s a bit upsetting.

Related to the above, after migrating a catalog you can’t migrate

*copies*of the catalog, or copies of the copies. So if you made new catalogs from year to year by copying the old one and removing all the pictures you will be sad. Luckily you can workaround this by making a*new*catalog in old LR and importing the original catalog into it, and then migrating*that*. Sigh.Most migrated pictures seem to start in a state where they have “legacy” settings, which makes new Lightroom throw up a warning which you have to click off by hand before you can edit them. Sadly there is no way that I can find to clear this flag except picture by picture. Which is kinda tedious if you have around 70K pictures.

Finally, some obscure bits of meta-data, like old camera profiles, migrate to the desktop version of the new LR but not the mobile versions. And again, because of the warning above there is no easy way to fix this in a big batch in the new LR. So it’s best to do as much of this batch editing as you can in the original catalog before moving it.

Overall the migration process is pretty painless and except for the very earliest catalogs had the advantage that it brought over my old albums so I did not have to go through the pictures one by one and rebuild those. It takes a few minutes to suck in the original files, and then a few hours to sync them to the fog.

I *did* have to spend a bit of time reconstructing the oldest albums (2002-2006 or 7) and while this is a sort of tedious exercise in principle, in practice it turned out to be great fun to reacquaint myself with what I was interested in taking pictures of back them. It’s mostly kid pictures, and a lot of people who I have not seen as much of lately for various reasons. If you ever have time to do a leisurely scan of your old photographs I recommend it. It was also an opportunity to add some photos that I edited out before for various reasons, but mostly because I tried not to have too many similar pictures of things taking up space. Now space is not really a concern, so I can toss a few extras in for the fun of it. There were also a lot of pictures that I just missed before, so overall this whole exercise has been enlightening and fulfilling.

I also have to say that the old D100 and the D70 cameras were incredible machines for the time. I have been constantly amazed as how well those old image files hold up. The D200 is a distinct step *down* in many ways. And of course the D700 was a big step up. Having looked through all these old pictures I also conclude that I probably could have just shot JPEGs this whole time. And yet I still shoot camera RAW files with my new non-iPhone cameras. Stupid brains.

The only remaining bit of anxiety that I have with this new tool is the lack of a more transparent local backup solution. This was my major reason for not moving to it before, and at some point I just decided to solve the problem later rather than worry about it now.

In theory the Adobe Lightroom web service *is* the backup. But we all know better than to trust *that* to work out, given the fog and all. You can tell the desktop version of the new LR to store a copy of each original in your local file system. But this is more to facilitate offline editing and exporting at full resolution than backups. The new Lightroom explicitly forbids you from *copying* your local library to a new machine and then opening it there. Instead you have to login to the service on the new machine and download the catalog from there.

This weird and arbitrary limitation is similar to the strange thing about only being able to migrate an old LR catalog once, even though it’s actually easy to migrate an old catalog as many times as you want. It feels like a UI decision dictated by some obscure part of their database schema rather than being something that you would want to do on purpose.

I think what I will end up doing is exporting copies of all the pictures that are in the yearly albums so that one could easily make a new catalog from those, so at least I have a second backstop for the most important stuff.

Or, maybe I’ll do a backup export of all the photos, but as JPEGs or HEIFs instead of RAW. This has the twin advantages of both being more universally readable and baking in the current processing parameters. Even with the great job Adobe has done preserving my processing instructions there are still some subtle differences caused by the evolution of their imaging engines … so baking in the current state is not a bad tradeoff to make.

In any case, after all this work it will be nice to have full resolution versions of all the old pictures, and some new old pictures besides. In addition to looking better just by being full size, they also look better because I could not resist tweaking them a bit to match how I make pictures look more recently.

In this way the spirit of the old darkrooms lives on in the new Lightroom.

A couple of Sundays ago we stopped at Rose Tea to get takeout to bring to a friend’s house. This is a thing we had done dozens of times since they opened almost 20 years ago, but this was the first time since the great stupid darkness that we had done it. Just as I finished ordering, the long time manager of the Squirrel Hill landmark told me that I had walked in on their last day of service. I am not sure of the details, but it sounds like they sold the location to someone else in order to take a well-deserved rest.

So that was the worst thing that happened to me that day.

The second worst thing was that being discombobulated from the news, I forgot to order one of the things they do best: the fried rice with Chinese Sausage. And now I won’t be able to get it again. Sigh.

Along with the O, the loss of Rose Tea will now be something that I forever associate with this particular point in history, although it’s possible that the place would have shut down anyway. As the manager said: they had been planning something like this for a while.

But, whatever the real reason for the closure, the fact remains that with Rose Tea gone we have lost an old favorite. Ever since they started serving their full food menu some time around 2004 Rose Tea has been the main proof that you could run a Chinese place in Pittsburgh without putting any of the standard Chinese American “classics” on the menu.

Here is the first thing I wrote about them then:

Rose Tea has gone from a purveyor of an odd Asian drink craze to simply the best Chinese food in Pittsburgh period. The home style Taiwanese food that they serve here is actually good enough for me to want to go even if I’ve recently eaten my mom’s food. In fact, it is much like my mom’s food, which is why it simply rules.

Here are some dishes that Rose Tea does that are so good they will make you weep with joy:

- Shredded pork with pressed bean curd.
- Any of the whole fish
- Taiwanese Chunk Chicken.
- Pork Stomach and Duck Blood.
- Taiwanese style braised Beef Stew (the stewed beef is actually cooked long enough).
- Taiwanese style rice cakes (just like mom’s).
- Chinese greens that are always cooked right. Not just every other time you go.
- The pickled cabbage appetizer
- The Taiwanese sausage appetizer
- The soy sauce eggs (just like mom’s).

All this and good prices too. I used to rate Chinese places in Pittsburgh by how many different sauce colors they had on their menu. It is therefore something of a watershed to have a place in town that has two brown sauces that are completely different in flavor. There is finally real Chinese food in Pittsburgh. Go get it.

P.S. If I catch you in P.F. Chang’s, I’ll kill you with my bare hands.

Pretty much right up until that last Sunday, all of this basically still applied. For most of their time at the corner of Forbes and Shady they have set the standard for well executed Taiwanese food, the best bubble tea, and good and reliable service.

But Rose Tea has always been a more important than just this to me. To me they represent a huge part of the beginning of the “Pittsburgh Food Renaissance” in the early 2000s. It seems to me that you have to include Rose Tea in with the group of “one word name” restaurants that all opened around that time and started making people pay attention to Pittsburgh food: Legume, Cure, Salt, Spoon, Dish, BRGR, and so on. The also recently departed Ka Mei also belongs on this list, especially in their original incarnation as Tasty in Shadyside.

I think we have Rose Tea to thank, at least partly, for the existence of the great Chinese Food corridor that exists in Squirrel Hill right now. And I think that for some strange reason they don’t get enough credit for this in the local foodie circles. It’s hard for me to imagine How Lee, Chengdu Gourmet, Everyday Noodle, Cafe 33, and the rest thriving as they have without Rose Tea having been there at the start to show people how to eat the food.

So anyway. There is not much more to say except goodbye to another old friend. You can still get a limited version of the Rose Tea experience that their mini-place in Oakland next to CMU. But it’s not really the same. And, we also have Cafe 33 with its own excellent take on Taiwanese food. Which is a great blessing.

Even though I said in my recent update that I would not migrate the site to a new content generation engine in my lifetime I went and did it anyway. This is fairly typical behavior for me. No one, least of all me, can tell if I’m going to disagree with myself five minutes after I declare something to be the truth.

Anyway, I couldn’t very well waste all that time I had taken to set up a stupid Haskell environment. With Hakyll in hand I started messing around with the one thing that would have kept me from changing anything: the CSS.

Unlike some of the more popular site generators Hakyll does not come with a large collection of themes, none of which look quite right. It turns out this is both a curse and a blessing. The curse is that you have nowhere to start. The blessing is that you can find other Haskell nerds on the Internet and steal their code and then figure out what you need. So, step by step over a weekend I stole pieces of this and that until I had adapted the standard Hakyll layout into something that looked like a replica of the old site. The worst thing was trying to understand how CSS page layout works well enough to make the site not look completely idiotic on my phone. I assume it will still look shitty on some device somewhere. I have decided not to care.

Next I had to adjust the HTML generator inside Hakyll (and thus really inside pandoc) to generate \KaTeX markup for math and also to generate URLs and feeds that match the layout of the old site. Can’t be making broken links.

Finally, along the way I also made a few improvements:

Got rid of the dumb sidebar that I never liked.

With the sidebar gone I could make the font size slightly larger for old eyes.

Got rid of the post categories that I never really used for anything.

Fixed some old text conversion errors.

Made all the links to other pages on the site relative.

Simplified the top level page to remove the useless teaser text for each post. Sharp eyed readers will now note that the home page and the archive page are basically the same. So why have both? Maybe some day I’ll change the home page to something else.

Things I like about the new setup:

There is only one small file of CSS that I can now just forget about.

I like how the simpler layout eventually turned out. It’s very much in spirit of the 1998 “Blue on White” Internet look that I’ve always been after.

Hakyll has a nice mode where it watches the file system and regens the site when you edit things. I’m sure pelican had this too but I never set it up. This made fooling with the site CSS almost tolerable.

No python nonsense to worry about.

Something is turing all the quote marks into “smart” quote marks.

With the various changes the site can actually run mostly locally on any computer you want. This also makes spot testing changes to the site a lot easier since all of the navigation stays local now. I should have set things up like this before, but was too lazy.

Now I can explain to people what the difference is between

`let`

binding and the`<-`

operator in Haskell.I like that after decades of research and development arguably the most useful thing that Haskell does is … convert markdown and \TeX to HTML (or PDF!) for web sites. If nothing else this new site is a celebration of the idea that Haskell is truly the perl of the 2020s. If you know me you know that this is a compliment of the highest order 🙂.

I will now resist the urge to tinker further. There are still a few inconsistencies in the text and layout of some pages. But it’s not serious enough to keep me up at night. If this change broke your feeds, I apologize. But, you can just update them using the new link.