I am very pleased. For years, now, I have striven to develop a clean and simple temperature control algorithm for my coffee machine which gets it to a stable temperature in the minimum possible time. This weekend, I cracked it: two minutes to a stable 95°C on my Gaggia Baby Class with no oscillations at all^{1}That’s not quite all that is needed – the brew head needs bringing up to temperature as well. But that’s a different topic and I’m still working on it. Two minutes is also not an absolute minimum because the boiler power can be increased. Also a topic for another day. For the purposes of this post, we’ll use the stock 1350W the machine shipped with..

This is how I did it…

## The preamble you can ignore if you know about PID

The canonical coffee machine temperature control algorithm is PID, or “proportional integral derivative” control. This starts with a simple idea: if the difference between the target temperature and the present temperature is e (for “error”), then we apply an amount of power equal to P × e where P is the coefficient of the “proportional” component of the controller. The greater the divergence from target temperature, the more power the controller applies. When stable (assuming this is possible), the error will hopefully be fairly small, but just enough to cause maintenance power to be supplied.

There are two great problems with a simple P controller. The first is latency. Any latency in the system will cause oscillations for sufficiently large P values because the controller will apply correcting power for too long (the whole time of the latency), by which time too much energy will have been injected into the system and the temperature will overshoot the mark, where the cycle begins again with too much correcting power in the opposite direction. The only way to get to a stable temperature is to use a small value of P. But this can mean that it takes a very long time to reach stability. And I want coffee now!

The second problem is power loss. Any real-world coffee machine leaks heat to the surrounding air. If the rate of loss is L, then P × e = L when the system is stable so there must always be a temperature error of e = L / P. For large enough values of P, the error can be made arbitrarily small. But, hang on, we can’t make P arbitrarily large because of the problem with oscillations!

The first of these problems can be mitigated somewhat if we look ahead to the future. If we measure not just the error e, but also the rate of change of the error e'^{2}Although measuring derivatives of noisy and discrete variables is also fraught!, then we can guess what e will be after the latency period by adding some multiple (let’s call it M) of e’ to e. So now power is P × (e + M × e’). Or, let’s collapse that by introducing the “derivative” coefficient D, which gives a power of P × e + D × e’ where D = P × M.

Finally, the fixed-offset error could, perhaps, be mitigated by maintaining a total of the error over time, i.e. the time integral of the error. Let’s call this integral E. If we apply a coefficient I to E, then over time any fixed-offset error will accumulate in E and the gap will slowly close. So the total PID controller applies power P × e + I × E + D × e’. Hence “PID”.

This would be lovely, except the I component of the controller also introduces oscillations and so the I coefficient also needs to be quite small and so, once again, it takes forever to get to a stable temperature and my coffee thirst is not abating!

So people who implement PID controllers end up tweaking them endlessly. There must be a better way.

## PID on the Gaggia Baby Class

Here’s a graph of what happens if you run the Baby Class up to 110°C at full power and then switch the heating elements off (power is on the right y-axis):

The latency is clear: it takes about 10 seconds for the temperature curve to fully respond to the discontinuities in the power curve^{3}Bother! Having slaved over this graph, I now notice I used 1600W and I had promised to stay at 1350W. I can, however, assure you that the same latency occurs with the stock boiler.. I have spent years trying to work around this.

The latency means we must have small values for P, I and D, which means that the power applied to get a stable temperature has to be varied very slowly over time. The whole section between about 50s and 170s below is just waiting time. And I even pushed the P coefficient so we do get a little bit of oscillation^{4}In the interests of clarity, the two graphs below are taken from simulations rather than a real coffee machine. This allows me to simplify the presentation by removing effects that are not germane to the present conversation.:

The ideal controller, on the other hand, would apply full power for some period and then immediately switch to maintenance power at exactly the right moment. i.e. the stable temperature would be approached very quickly^{5}The smooth tailing-off of the power between 60s and 100s is caused by power loss to other components of the system which are still warming up rather than the controller approaching the target power slowly..

## The eureka moment(s)

What if there were no latency? Then the P value could be really large without causing oscillations, so there would be no need for the D component of the controller, and the fixed-offset error would be really small so there would be no need for the I component of the controller and the controller would be very reactive, so the system would stabilise really quickly.

So, what if, instead of applying a PID controller to the measured temperature, we apply just a P controller to a digitally modelled temperature that responds immediately to changes in power? This would work perfectly. Well, perfectly with one (critical) down-side: no model is perfect, so the model will diverge from reality resulting in bad coffee at best and an explosion at worst^{6}How hot would the boiler need to get to explode? Well, we know it can handle at least the 15 bar the pump generates, so (quickly checking steam tables) you’d need well over 200°C. The silicone O-ring will still be safe at this temperature but the thermal fuse ought to have blown. Still, I think, for the sake of rhetoric, we can count a blown fuse as a small explosion!. So we have to somehow use incorporate real-world temperature measurements into our model.

How to do this? I’ll spare you the years of tinkering and jump to what works. There are two sides of the boiler shell that contain heating elements and two plain sides. My temperature sensor is placed one of the plain sides. Let’s try splitting the shell into two components, the plain sides and the element sides, and then model heat transfer between the two. For this we need to stop controlling for temperature and start controlling for total energy. I’ve measured a lot of important physical parameters of the machine so I know the boiler shell has a heat capacity of 549 J/K. Splitting this evenly between element sides and plain sides and picking a suitable heat transfer coefficient (let’s pick 14 W/K out the air – and I’ll explain this number later), the real world graph at the top now compares to the model as follows:

The red and green lines diverge significantly, so we still have some work to do, but, critically, both lines display the same latency. But, as can be seen from the (not to scale) brown line, the total energy has no latency and would be perfect for a P controller!

The divergence between the red and green lines represents two factors. One is the energy which goes into the water, brew head and general body of the machine. This is what causes the red line to curve upwards from about 100s onwards as these elements warm up and use progressively less power. The other is the loss to the atmosphere which, being big and full of convection currents, never gets warmer^{7}By any measurable amount anyway!. This causes the asymptotic slope towards which the red line trends. The green line knows nothing of these things, but it needs to.

In principle, I think we could probably lump the water, brew head and body into one thermal mass, but I don’t want to do that because I would actually like to know what the water and brew head temperatures are. It’s all very well knowing how hot the shell is – but it’s water that brews coffee and the water has to pass through the brew head. So, let’s start with the water which, by my measurements, has a heat capacity of 422 J/K. The brew head has a heat capacity of 616 J/K. I have measured the heat transfer coefficient from the boiler shell to the brew head at 3.6 W/K. We’ll assume both parts of the boiler shell have the same (unknown but we’ll pick something suitable) heat transfer coefficient to the water. This is what we get:

Looking good! There is still a fixed amount of energy to squirrel away somewhere: in the body presumably. And the asymptotes have different slopes. We’ll deal with the latter first. The asymptote is related to the loss to atmosphere which, given that by far the hottest external part of the machine is the brew head, I’ll model as a loss from the brew head to the atmosphere. We just need a heat transfer coefficient, but that’s easy: just tinker until the red and green lines end up parallel. It will turn out that 0.55 W/K is the magic number. Et voila:

Finally, the delta in temperatures between the red and green lines represents about 32kJ which, at about 80°C, represents a heat capacity of 395 J/K for the case. (The numbers I’m quoting now have been fine-tuned by tinkering. This is also how I got the 14 W/K heat transfer coefficient between element sides and plain sides of the boiler shell.) This is the complete model:

## But does it work?

Does it work? Yes it does! Since the measured and modelled plain side temperature are now so close, I implemented the model but used the measured temperature in place of the modelled temperature. i.e. I used the red line instead of the green line. The P controller controls for total energy of boiler shell and water. Here is the result with my real-world Gaggia Baby Class going from cold to a dead-stable 95°C (this time I’ll throw in the water and brew head temperature curves):

From cold to a dead stable 95°C in less than two minutes. And note, now when I talk about 95°C, I am talking about the water temperature. The previous PID controller was aiming for a stable boiler shell temperature. But we see from the graph above that water temperature will be slightly warmer than the measured shell temperature because there is no single temperature for the boiler shell – it is hotter on the sides with the elements.

So, much happiness here!

## To summarise

Here are all the details of the controller in one place. First, the constants:

symbol | value | unit | description |
---|---|---|---|

549 | J/K | heat capacity of boiler shell | |

422 | J/K | heat capacity of water | |

616 | J/K | heat capacity of brew head | |

395 | J/K | heat capacity of body | |

14.0 | W/K | heat transfer coefficient from element sides of boiler shell to plain sides | |

14.7 | W/K | heat transfer coefficient from boiler shell to water | |

3.6 | W/K | heat transfer coefficient from boiler shell to brew head | |

1.8 | W/K | heat transfer coefficient from boiler shell to machine body | |

0.55 | W/K | heat transfer coefficient from brew head to ambient air | |

95.0 | °C | target temperature |

The variables are:

variable | unit | description |
---|---|---|

s | time between iterations | |

W | output power in step | |

W | heat flow from element sides to plain sides in step | |

W | heat flow from element sides to water in step | |

W | heat flow from element sides to brew head in step | |

W | heat flow from element sides to body in step | |

W | heat flow from plain sides to water in step | |

W | heat flow from plain sides to brew head in step | |

W | heat flow from plain sides to body in step | |

W | heat flow from brew head to ambient air in step | |

°C | temperature of element sides in step | |

°C | temperature of plain sides in step | |

°C | temperature of water in step | |

°C | temperature of brew head in step | |

°C | temperature of body in step | |

°C | ambient air temperature in step (I used a constant) | |

J | energy error term for P controller in step |

On each control cycle iteration, the following calculations are made:

is not calculated, but read from the temperature sensor.

Finally…

I used for the coefficient of the P controller. A much higher value would cause errors due to the discreteness of .

As for the P controller that’s it. But since we know how much energy is leaving the boiler + water system, one further refinement is possible: we can also add to the power. This eliminates the need for a fixed error offset to generate the power to counter them. This is what I have done. So my final calculation for is:

## And a sneak preview

I have been also working on an algorithm to get the brew head up to temperature as fast as possible and have been mucking around with increased boiler power. As of today, this is the best I have done^{8}Also visible below, I model power lost to water flow and to latent heat of vaporisation when generating steam. Another topic for another day!:

160s from switch-on until the water and brew head are ready for the perfect shot. For the first 110s, I am measuring and grinding the beans and filling the milk jug. After that, the portafilter is hot and can be removed for dosing and tamping. It’s getting to the point that the machine will be ready before I am, which is what I want. There’s more scope for play, but I’ll take it one step at a time, making sure nothing is melting before pushing the envelope further.

1. | ↑ | That’s not quite all that is needed – the brew head needs bringing up to temperature as well. But that’s a different topic and I’m still working on it. Two minutes is also not an absolute minimum because the boiler power can be increased. Also a topic for another day. For the purposes of this post, we’ll use the stock 1350W the machine shipped with. |

2. | ↑ | Although measuring derivatives of noisy and discrete variables is also fraught! |

3. | ↑ | Bother! Having slaved over this graph, I now notice I used 1600W and I had promised to stay at 1350W. I can, however, assure you that the same latency occurs with the stock boiler. |

4. | ↑ | In the interests of clarity, the two graphs below are taken from simulations rather than a real coffee machine. This allows me to simplify the presentation by removing effects that are not germane to the present conversation. |

5. | ↑ | The smooth tailing-off of the power between 60s and 100s is caused by power loss to other components of the system which are still warming up rather than the controller approaching the target power slowly. |

6. | ↑ | How hot would the boiler need to get to explode? Well, we know it can handle at least the 15 bar the pump generates, so (quickly checking steam tables) you’d need well over 200°C. The silicone O-ring will still be safe at this temperature but the thermal fuse ought to have blown. Still, I think, for the sake of rhetoric, we can count a blown fuse as a small explosion! |

7. | ↑ | By any measurable amount anyway! |

8. | ↑ | Also visible below, I model power lost to water flow and to latent heat of vaporisation when generating steam. Another topic for another day! |

This is absolutely amazing work, I love it.

Now tell me – how do I calculate/measure the constants for other machines? 🙂

A lot of it I did by writing specific calibration code. Some of my calibration logic is in the source code I published here.

I’d have to go back and look at that code for details and, anyway, some of the code might have been changed later. But, for example, if you want to know the heat transfer to ambient you just need to apply a specific power level for a long time and then measure the asymptotic temperature, subtract ambient and divide the result by the power level.

PS sorry for taking so long to reply. I rebuilt my web server some time ago and obviously did not get me e-mail notifications working properly.