Loading Notifications...

Friday night Covid plotting

Please Register as a New User in order to reply to this topic.
 wintertree 20:42 Fri

To gather my various plots into one place; if there's enough interest I'll update each Friday.  Let me know if you have any questions about methods etc; happy to answer but it's a lot of work to cover every plot on the off chance someone wants to know about one.

Plot 1 - Infections

  • Black data - ONS infection estimates
    • Shows the last couple of month’s of data from ONS Pilot Infection Surveys which estimate the number of new daily infections in England by random population sampling.
    • Markers are at the mid-point of period of data reported by the ONS. Typically they give a single daily rate for a period of 7 days. 
    • Error bars are one half the 95% confidence interval from the ONS data, as an estimate of ± 1 standard error.
    • The trend line is made by fitting a polynomial to the nearest 3-4 data points for each day, then smoothing that jagged fit with a polynomial filter.  I make no claims about the accuracy of the trend line.  It assumes that the underlying data is smooth and continuous and passes near the median values from the ONS; this seems reasonable to me…. I do this curve as it’s useful to have a continuous dataset for the IFR estimates (plot 3 below).
  • Blue data  - UK testing case data from the government dashboard
    • Note: this appears to include the lateral flow test cases from Liverpool likely as they then get sent for PCR testing under pillar 2?  The Liverpool addition is a minor effect
    • The trend line is a 3rd order 21-day savitsky golay filter.  I pick 21 days as an integer multiple of 7 (and as an SG filter needs an odd number of datapoint, so 14 days is out) so as to minimise aliasing (biassing) by the clear 7-day periodicity in the data.  This is a smoother filter than a 7-day moving average but not less responsive.
  • Observations
    • By both measures, infections are falling
    • The fall in cases appears to lag that in infections, but I don’t think the data supports much proof or otherwise of that.

Plot 2  - Infections and fatalities

  • Black data -  the same ONS data as plot 1
  • Red data  - Daily deaths (England)
    • From the UK government dashboard, ordered by date of death.
    • The trend line is done using the same SG filter as cases.
  • Observations
    • Both infections and deaths are falling with a ~ 2 week lag from peak infections to peak deaths.

Plot 3 - Estimating the IFR for England

  • This divides the number of deaths on day X by the number of infections on day (X-lag) for various lags.
  • The problem here is that I don’t have the longitudinal data for each death giving time from detection to death so I have to work without the critical piece of information for calculating IFR when cases and infections are changing.
    • One approach is to use a fixed distribution of times from infection to death.  I have inferred that this is not compatible with the data as there is no such fixed distribution between hospitalisations and deaths data (as you will find if you try and do a deconvolution…). 
    • Choice of the “wrong” distribution can bias the results up (to long a lag in a rising exponential phase of the “wave”) or down (to short a lag).
    • Instead I use a range of fixed lag times and plot them all.
  • Observations:
    • All the different IFR estimates are converging on ~0.64% right now - this is because we have had plateau phases in infections and deaths, so the lag “drops out” of the calculation as whatever lag is used, the death and infection rates are the same - because during the plateau phase neither are changing with time.
    • By some slightly circular logic, this suggests to me that currently the 12-day lag is reasonable.  But the relationship between the infection and death is always shifting with the changing demographics, care levels and the developing hospital situation.
  • Note of caution - the ONS survey is only one estimate of “true” infection levels, as opposed to detected cases.  Different data sources for true infection will give different IFR estimates.
    • The MRC nowcast is giving circa 50k infections/day as with ONS
    • REACT 1 Round 6 suggested circa 100k infections/day at the end of October.  This implies an IFR closer to 0.32%.
    • Deaths with Covid on the death certificate are larger than the deaths under the "28-day" rule that I am using.  This implies a higher IFR.  It's not a major effect (about a 1.2x increase??) 
  • Final comments
    • So IFR in the UK is probably in the region 0.32% to 0.64%.  This is using the median value from the ONS survey (0.64%), aligns with a quick glance at the MRC nowcast and is lowered (0.32%) with a quick estimate using the ratio between the ONS data and the REACT data.  Using upper and lower CIs from each of those studies raises and lowers the IFR estimates accordingly.
    • The range 0.32% to 0.64% is not a small number, and is not even a factor of 2x improved from where we likely were in April 2020. 

Plot 4 - Test and Trace Efficiency 

  • This is the cases found by Pillar 1 + Pillar 2 for England divided by the ONS estimate of infections - i.e. what fraction of infections are found by test and trace and Pillar 1 and Pillar 2 testing.
    • If I'd used the REACT estimates not ONS, these numbers would be about halved (i.e. 2x worse).
    • I've not seen any discussion about why these two differ so much.  Anyone?...
  • SAGE suggest we need to trace 80% of contacts to have an effective test and trace system.  If we're not detecting 80% of cases, I doubt we're testing 80% of contacts.  It's not anywhere near this simple, but I've put an 80% shade on the plot.
    • It's promising that the fraction has been increasing during the plateau phase in infections - this suggests to me that the system is improving.  Hopefully as cases fall it improves more (the fraction was a lot better in September before cases rose a lot - it should be better still when they fall to that level...)

 Plot 5 - Jitter in case data

  • Plotting daily case date for the whole UK from the government dashboard and a trend line (SG filter again).  The residuals show the clear 7-day cadence in tests - this is not reporting lag as the data is by specimen date.  
  • The "normalised residuals" should be of magnitude ~ 1 if the noise was from the random variation in people; it's a lot larger mainly due I think to the 7-day cadance
  • I had an insight into the "weekend effect" when Matt Hancock announced earlier this week that the Royal Mail are going to empty more postboxes over the weekend..  If the "specimen date" is the date it enters the laboratory and not that written on the label, this makes a lot of sense.  As we found out earlier this week, the postcode for book keeping positives comes not from what's written on the label but from GP records...
Post edited at 20:43

 wintertree 20:59 Fri
In reply to wintertree:

Plots 6, 7, 8

  • Grey points - UK level data from the government dashboard
  • Black lines - trendlines from an SG filter 
  • Red lines - doubling times.  For each day, I fit an exponential function to a window of ± 7 days to measure the "doubling time" - how long it would take for the measure to double based on how it's behaving in that window.  
    • A small doubling time is bad, a big one is good.
    • Doubling times in March/April for hospitalisations and deaths were ~ 4 days
    • Where the red line on plot 5 dives blow the x-axis this means it's become a halving time - i.e. cases are now decreasing.

Plot 9 - Characteristic times (UK level data)

  • This shows the doubling/halving times for each measure.  
    • The right hand data points on this are very twitchy - they respond to new datapoints and to additional data retrospectively released (cases over up to 7 days typically, deaths over more than 15 days).  
  • All doubling times were increasing after mid-September indicating that the growth in all measures was continuing but the exponential rate was slacking off.  This reversed with the  rates picking up in cases, then hospitalisation, then deaths towards the start of November (the doubling times plateau / decrease for a bit but now they are all rising again which is great - measures are doubling much more slowly)
  • The data for cases and deaths has just tipped over to a halving time - triangle markers without lines.  This is great.  This region of the plot is very twitchy though (as more data is released) so I would take the values as highly provisional.  It'll be interesting to see what it looks like next Friday.

Plot 10 - CFR estimates

  • Measures for all lags are converging right now due to the plateau phase in cases and deaths.  
  • Estimates of CFR as per Plot 3, but using UK level deaths and UK level cases.  Convert to an IFR estimate using the ratio of cases to your favourite random sampling survey or nowcast. 

Post edited at 21:10

 wintertree 22:06 Fri
In reply to wintertree:

Using the geographic data from the dashboard under "Age demographic of cases by specimen date".  This comes with a pre-applied 7-day rolling sum.  This is all cases/day per region (UTLA). 

Plot 11 - UTLA case data

  • This is sorted by the most recent case count in the region.  I'm somewhere uncomfortably far up the left hand side plot. 
  • Spot the university outbreaks...

Plot 12 - Rate of change

  • The y-axis is the case number and the x-axis is the current day-on-day change as a percentage.  Blue text means the rate is -ve and cases are decreasing.  
  • Don't read too much in to font size - it's just an attempt to make things readable with different amounts of overlap.
  • This plot isn't great!  

Plot 13 - Rate of change and acceleration

  • The x-axis shows the day on day % change from plot 12 - how fast cases are growing (+ve) or falling (-ve).
  • The y-axis shows the change in that % over the last 7 days.   
  • Some text is added to interpret the 4 quadrants.
  • This plot also isn't great but it's interesting (to me anyhow).  
  • Tomorrow I'll see if I can add a plot of cases vs time for each of the 4 quadrants.
  • Its not uncommon it seems that once cases start decreasing, the rate of decrease slacks off (top left quadrant).  
  • As long as a UTLA stays in either of the left 2 quadrants, it's cases are decreasing.
  • Nowhere is very far in to the upper right quadrant - nobody wants to be in the upper right of that quadrant.


  • The data used to make plots 12 and 13 is always 5 days out of date, and the data is less current still because of the rolling sums applied to the data.  So it's always a bit stale.
Post edited at 22:06

 wintertree 22:15 Fri
In reply to wintertree:

Example traces for each of the 4 quadrants from Plot 13 above.  Having looked at a few, I'm cautious about over-interpreting this plot as the individual UTLA data is quite noisy and there's a lot going on.  I might look at doing more smoothing on the data...

Post edited at 22:16

 marsbar 23:00 Fri
In reply to wintertree:

Thanks for this.  Too sleepy to read it now, but will do when I'm awake. 

In reply to wintertree:

Excellent, many thanks for your hard work

In reply to wintertree:

You have a job? How the f*ck do you have the time to do this?  And thanks very much. 

In reply to wintertree:

Don't work for GCHQ do you?

It would be interesting to have ukc allow you to write an article. I think alot of readers would be interested, especially comparing actual data you produce vs what we get via media and government press releases.

Thanks for the work.

 wintertree 11:36 Sat
In reply to George Ormerod:

I recently dropped one job after some introspection brought about by lockdown so don’t have the time same pressures I used to...

The plots don’t take that much time - I’ve been adding and refining bits for a few months.  It only takes a few minutes to drop a new day’s data through the plotting pipeline and there’s been maybe 18 hours put in to the codes over 5 months.  

I did end up spending too long on plot 12 which I’m not yet happy with.  Si dH has rightly pointed out that this stuff has to be considered at a local level, and trying to present all the different facets of local data for quick assimilation across the country isn’t the most obvious set of plots to make.  Still, it’s not like I could have gone down to the pub instead...

In reply to Cwarby:

I don’t think UKC would want to lend their editorial identity to what I have to say.  In terms of quality of understanding and insight in the reporting of daily numbers by the press it’s not just unilluminated its veers in to nonsense when speculating on causes behind what is nothing more than noise on the system, and swings between sensationalism and missing important signs of problems.  What I have to say their pales in to insignificance compared to my take on one of the organisations “professionally” presenting and interpreting evidence on the situation.

 Offwidth 11:49 Sat
In reply to wintertree:

Thanks again for all this. On the UKC point they allow opinion pieces all the time. It would be fabulous to see an article summarising your views and concerns over the year and for the future.

In reply to Offwidth:

> Thanks again for all this. On the UKC point they allow opinion pieces all the time. It would be fabulous to see an article summarising your views and concerns over the year and for the future.

I agree.  I just need to get to a bigger screen to look at all this properly!

 wintertree 15:08 Sat
In reply to wintertree:

Working to combine all the information

  • Geographic - location
    • I've made a map
  • Absolute number of cases/day
    • The more vivid a colour is on the map, the higher the cases/day value
  • Growth rate (of cases/day) - are cases rising or falling?
  • Trajectory in the growth rate of (cases/day) -
    • Becoming more positive - is the growth getting worse, which could mean growing faster or tipping over from decay in to growth
    • Becoming more negative - is the growth getting better, would could mean growing less fast, or tipping over from growth in to decay

To do this, I've re-assigned the quadrant colours from Plot 13 to work better, and made a map.

I'm not best please with the measurement of the rate of growth in cases and the acceleration in that, I might get the method re-worked for next week.

Post edited at 15:20

 wintertree 16:12 Sat
In reply to wintertree:

Plot 14a - fixing plot 14 which looked nice but was totally messed up (it was assigning colour based on rates in two different weeks, not rate in one week and the change between rates...)

Plot 15 - the cases/day summed for each of the four separate quadrants of Plot 13a and Plot 14.

The cumulative plot on the left shows how growth in "red" regions has taken over driving the total numbers for England as the T2/T3 regions in green and orange started to decay.

The sum of the "red" regions looks like it's tipping over in to decay, but not within the timescale I measure the rate on; in a few days those regions might start turning blue or green... 

There's a lot of blue areas - cases are falling but the halving time of the cases is getting longer - this means that the fall is becoming less aggressive.  Lots of interpretations of this; one is that household transmission still has to play out after the initial effect of closing hospitality kicks in.  These rates I plot are based on the % day on day change so they normalise for the exponential mechanic.

I like flipping between plots 13a and 14a so I've re attached 13a.

I think this analysis is quite "twitchy" to the variation in the data so it's not one to over interpret but it makes a nice snapshot.  I look forwards to seeing a lot more green and blue next Friday.

For some reason a couple of regions are missing from these plots...

An errata on my 22:15 plot with the individual regions - the y-axis is the 7-day rolling sum not average.

Post edited at 16:20

 Si dH 17:23 Sat
In reply to wintertree:

I like your map plots, thanks. The guardian do a reasonably good one showing direction of change that I tend to use alongside the dashboard map to get a quick snapshot each day of the latest developments, but I'm not aware of anywhere else showing acceleration/deceleration. I assume you only have that data by storing past data yourself?

Is it showing d(rate of change in cases per week)/per day or d(rate of change in cases per week)/per week.

I need some nomenclature to avoid writing things out.

 wintertree 17:36 Sat
In reply to Si dH:

Thanks.  The data source for this is the joint demographic and geographic (UTLA) breakdown from the gov dashboard; this is a time series by day with 5-year age bins within each UTLA.  It’s got a 7-day rolling sum applied inside each age bin presumably for privacy reasons.  This only appeared online a few weeks ago.

The velocity - the rate of change - is the finite differences method over 7 days expressed as a percentage of the most recent case rate.  So it’s

velocity[day X] = (cases[day X] - cases[day X-7]) x (100 / cases[day X])

The acceleration is just the difference in velocities over a week so the unit doesn’t change from that of velocity other than a /week term.

acceleration [day X] = velocity[day X] - velocity[day X-7] 

I should move to using an SG method to do the differentiation more holistically, or fit an exponential to the period and use that to measure the growth rate.  Before I do I want to try and deconvolve the 7-day box filter as otherwise I’m compounding the blurring effect of the box filter with whatever I do, but that’s all a bit too much like hard work...

The other thing I thought would be interesting is to plot the average age vs time for each category, within the confines of the age bins.

I think the acceleration is an important concept to look at as it shows areas with rising cases that are tipping over - the orange areas.  I’m less convinced that it tells us anything so profound about green vs blue regions.  

Post edited at 17:46
 RobAJones 18:04 Sat
In reply to wintertree:

Thanks, I haven't read it fully yet but have had a quick skim through. I appreciate now that the IFR is probably now around half what it was in March. Mainly down to improved treatment, perhaps a higher proportion of cases in younger people but not because of any mutation in the virus. I think I have been a bit stubborn to accept this. Most of my arguments are with people of the "let it rip" persuasion who often quote very low IFR's.  However, I don't believe the virus is any let deadly, so if we "let it rip" now, the IFR would be the same (well very similar) to what a "let it rip" IFR would have been in March.

 Si dH 18:31 Sat
In reply to wintertree:

Ok thanks. I can see why that would give a lot of noise. I had assumed it was using weekly averages or sums to produce the velocity. I think it would probably be fairly straightforward do that?

Velocity (day X) = [sumcases(dayX : dayX-7) - sumcases(dayX-1 : dayX-8)] *100 / sumcases(dayX : dayX-7)

 wintertree 18:39 Sat
In reply to Si dH:

Yes; that should be straight forward, but I didn't do it as the data has already had a 7-day moving sum (or scaled average) applied to it as part of its release process - hence I didn't do it.  If I did do it, each velocity measurement would span 14 days, and so the acceleration measurement becomes very delocalised - and delocalised in a bad way using only a pair of box filters which is really not very appropriate for exponential data.

The code for these plots needs a bit of a tidy up to make it more idiot proof (i.e. me) as it's looking a bit fragile and mistake prone.  Then I'll have anther look.  I think deconvolving the box filter out will be simple as the data is all positive which I think means there are no pathological/degenerate entries and I don't care about the correct assignment of cases to days in the first week of the time series which themselves are very small compared to values now so the error from whatever approach is used for the initial 7 days will be small, and will anyhow not affect velocity.  Getting back to the raw data seems like a much more appropriate starting point...

Post edited at 18:40
In reply to wintertree:

What are your conclusions about where we currently are overall?

Mine - from much less analysis and thinking than you've done (orders of magnitude less) is that the 2nd wave might be peaking or just about have peaked with deaths at about 400-500/day, but we really need another week of figures to have confidence that this is really so.

 wintertree 19:00 Sat
In reply to Michael Hood:

I think you've got it right.  

My main worry had been that small but rising cases in some areas can be masked by high by falling in other areas giving premature or false hope.  This has been happening since the very start of the pandemic both with geographic regions and demographic bulges (especially university outbreaks)

Figure 15 digs in to this and it says to me that even areas crudely classified as "rising" are starting to bend over into level or falling behaviour so we're past the point that "masking effect" could happen and things should keep falling - and falling faster as more of the classifications from Figure 15 tip over.  

The most recent day feeding in to Figure 15 is November 15th - 10 days after lockdown started and so far enough in that the effects of lockdown are starting to show in cases.

So, I have confidence that the plateau of cases is real and the decay is here and growing.  The plateau has already translated in to a plateau in hospitalisations and likely deaths (characteristic time plot up thread).

So - all that analysis effort and the only difference is I'm confident to call it a week before you... Not really worth it is it!  

Still, the pucker factor remains very high over the hospital situation over the next few weeks; hospitalisations may be in a national level plateau but that's people going in to hospital; it takes them some time to go out again so the total level may keep rising.   Further, winter is just around the corner and it remains to be seen what direct and indirect effects that will have on Covid and the - so far - absent influenza season.

In reply to wintertree:

Ah but in any arguement you can back it up with numbers and analysis whereas mine isn't much more than gut feel.

 wintertree 21:53 Sat
In reply to Si dH:

Yup, with a bit of effort, the 7-day sum can be deconvolve out; a rather ugly plot below to show the verification of the method and note something of importance.  The method is shown being tested in plot A.

  1. The blue crosses are the government data - divided by 7 to give a 7-day rolling average.
  2. The grey triangles are the result of a deconvolution to get the actual case levels for each day
  3. The small red crosses are the data from (2) fed through a 7-day rolling average.  If everything worked, they should land precisely on the blue crosses, which they do.

Having recovered the actuals data for each day, an SG filter can be applied to smooth the random fluctuations whilst preserving a higher order shape than a 7-day box filter, and without introducing the same lag as the box filter.  This is shown in plot B:

  • The grey crosses are the original 7-day moving averages from the government dashboard
  • The black curves are the SG filtered version of the actuals recovered from the dashboard.
  • What we can see is that both rising and falling phases of the 7-day moving average lag the filtered version of the actuals.   This means that the recent drop, which has just started, in actuals is much more visible in the recovered data than in the government dashboard data.
    • A 7-day moving average always lags when applied up to the end of a time series as it's a fully asymmetric kernel at all times. 

This is why I didn't want to do more averaging in the measurement of velocity and acceleration. The next step is to use the SG filtered, deconvolved data to get a more up-to-date and less noise sensitive map plot.

In reply to wintertree:

I was expecting a link that would allow us to eavesdrop on tonight's zoom call between Nigel Farage, Gupta, Chris Evans, the "Covid Recovery Group" MPs, etc, and all I got was your bloody graphs!

Seriously, thanks for amazing amount of unpaid work, it is extremely useful in trying to understand what's going on.

In reply to wintertree:

Mr Tree, that is excellent thank you for taking the time to post it. It certainly looks to me that we have passed the peek of this second wave in the most part and starting on the down slope again. 

Someone mentioned the Guardian doing similar? I think I would rather trust yours with no political agenda. 

 wintertree 08:38 Sun
In reply to Jon Stewart:

Thanks Jon.

> it is extremely useful in trying to understand what's going on.

That’s why I’ve been doing it; plotting stuff is often my best way of understanding something - not so much the outputs but the process of doing it.

> the "Covid Recovery Group“

I see they’re gathering momentum and just wrote to the PM stating that the post lockdown tiers could be a “cure worse then the disease”.  Do you think they’ll also tell Johnson what their Doctor friend was telling them only the other day about hospitals being quiet?...

Post edited at 08:39
 wintertree 10:07 Sun
In reply to Dax H:

Thanks Dax. I'll need to dig out the Guardian map.  I've not had problems with their reporting but I get a lot more out of making my own plots, mainly because it forces me to learn about the limitations of the data etc...

Having got the deconvolution working, I've redone my dashboard map where colour shows the direction of cases and it's "acceleration" - is the case rate (characteristic time, which is invariant of the absolute number of cases and is a proxy for the R number) going up or down?

These maps are about 5 days more current than the previous ones, as I've downloaded data one day newer and the deconvolution removes the blurring, lag effect of the 7-day rolling average.  I'm going to have to make a movie of this over time and put it on YouTube...  This is also doing the measurement of the direction of the case rate in a much less noise sensitive way than before and a better measurement of the acceleration which has reduced the blue it seems...

Post edited at 10:11

 Si dH 15:37 Sun
In reply to wintertree:

> Thanks Dax. I'll need to dig out the Guardian map.  I've not had problems with their reporting but I get a lot more out of making my own plots, mainly because it forces me to learn about the limitations of the data etc...

It's interesting how different people learn. I don't have anything like your skill at processing the data and doing quantified statistical analysis myself but I'm very attuned to looking at limitations and trends. I get most out of poring over the covid dashboard map every evening, it's an incredible source of insight into the pandemic if you look at it every day and spot the trends on a local and regional level (whether or not this is good for mental health is a different question). However the change in rate that the dashboard map shows can only be accessed by clicking on an individual LA or MSOA, which is time consuming if you are interested in more than a very small area. You can also move the slider bar too but then you lose any granularity by day.  So I use the daily guardian 'on the rise' map too, which is just a visualisation of the same data nationally showing whether an LA is increasing/decreasing slow/fast since the previous week. It's nothing special but useful for me alongside the dashboard.

Really appreciate the time you spend to do some of the statistics. The map you post most recently with the deconvolution of current data looks both very useful and very promising for the current situation if it is accurately removing the lag effect of the 7-day average. I don't really understand what you've done, would the method usually increase uncertainties much?

Post edited at 15:49
 Si dH 15:55 Sun
In reply to Michael Hood:

> What are your conclusions about where we currently are overall?

> Mine - from much less analysis and thinking than you've done (orders of magnitude less) is that the 2nd wave might be peaking or just about have peaked with deaths at about 400-500/day, but we really need another week of figures to have confidence that this is really so.

For what it's worth, although cases have almost certainly peaked now, I'm not as confident about deaths. On a national average basis, case rates went through a significant bump at the beginning of November before dropping again; hospitalisations then did the same but so far deaths have not, as far as I can tell. It might be that they don't because the variation in time to death smooths out the kink in the curve such that it becomes unnoticeable, but equally there could be another bump yet before deaths drop on a continuous basis. 

You can see the effect I'm talking about in Wintertrees doubling time graph.

On a more local level there will be lots of areas where hospitalisations and deaths are still increasing; equally there are others that peaked a while ago. Liverpool hospitalisations have been going down for a month.

Post edited at 15:56
 wintertree 16:16 Sun
In reply to Si dH:

> I don't really understand what you've done, would the method usually increase uncertainties much?

In short, I think no it doesn't.

In long:

  • The original "raw" data is a time series of data by date.  The data has a lot of variation in it - random statistics on people and also the massive "weekend effect".  They "blur" it with a 7-day moving average which smooths out a lot of the noise.  This blurred data is what goes on the dashboard for download.  (The blur isn't applied to the main datasets, just things like the UTLA+age breakdown).  This blur also introduces a variable lag (it depends on what the data is doing somewhat).  
  • Mathematically, applying a moving average is a convolution of the raw date with a filter kernel.  A deconvolution is the reverse mathematical operation that takes the output data and one of (the kernel, the input data) and finds the other.
  • The deconvolution takes that blurred data and asks/answers the question "what data must have gone in to the moving average blur to produce the dashboard data?". It does this by finding the "raw" data that must have gone in to the filter.  In general there isn't a unique answer to that question (that is different sets of noisy data could all produce the same output), but for this data there is a unique solution as we know the initial conditions - 0 cases/day - which rules out degeneracy.
  • The 7-day moving average data is still quite "noisy" in that the weekend effect fluctuations and other sources of variation feed through to measurements made from it.
  • The recovered "raw" data doesn't have the lag as the 7-day moving average, but it it a lot noisier which means simple measurements made from it will be very noisy.  I smooth it with a 21-day 3rd order polynomial filter; this does a qualitatively similar job of redistributing the noise than the 7-day moving average (people with more stats knowledge than me may disagree), is more responsive to trends in the data, and - critically - doesn't introduce lag like the moving average does.  This is important because we're living in the same time as the leading edge of the dataset and we care most about the now.    (If we were doing the moving average on a full dataset not one truncated at the now, it could be balanced so an not to introduce any lag).
  • The differentiation is done by the same polynomial filter.  So I think the uncertainties in the case numbers and their rate of change are qualitatively less noisy than those made from a 7-day moving average, but don't ask me to quantify it; the methods I know are based around randomly distributed noise, and the noise in the testing data is anything but...

In terms of pouring over data and mental health - turning this in to a hobby project lets me park it completely out of my head when I'm not working on it, and it leaves me pretty immune to the news.  It has rather displaced my other hobby projects of trying to type up and edit some children's stories and getting a wild animal detector/classifier working on the CCTV to buzz me when the ******* rabbits are about.

Post edited at 16:38
 Hardonicus 18:44 Sun
In reply to wintertree:

Have you any recent data on excess deaths? I can only find info up to 6th Nov or thereabouts.

In reply to wintertree:

Cant you not eliminate the lag of the moving average filter just by offsetting it in time? I agree that if you just take the last 7 points, sum them and divide by 7 then call that the current day's result then there is a lag. However if you relabel that result to be an earlier day then the lag goes away. 

On a Monday, if the reported data are the mean of previous Tuesday-Monday results then just label that as previous Friday's result and the lag is gone. 

I think the danger with deconvolution is that errors propagate and could skew the data. 

 wintertree 20:43 Sun
In reply to richard_hopkins:

> Cant you not eliminate the lag of the moving average filter just by offsetting it in time? I agree that if you just take the last 7 points, sum them and divide by 7 then call that the current day's result then there is a lag. However if you relabel that result to be an earlier day then the lag goes away. 

I agree with all of that.  However, if I do as you suggest I then know the data up to day X - 3.5 (or so), where-as if I do the deconvolution I know the data up to day X.  If this data was all some historic event it wouldn't matter one jot - for the reasons you give, but in this case the I want to know what the current situation is,  the deconvolution approach gives insight 3-4 days closer to now than the moving average.   I still need to deal with the "noise" (weekend sampling effect etc) in the data that had been smoothed by the 7-day filter , but I can use an implementation of SG filtering that has a symmetric kernel so as not to introduce lag, and that dovetails in to fitting polynomials at the edges of the window (where the lack of future data prevents a symmetric kernel being applied) and this means the filtered data remains lag free up to and including the time of the final data point on day X. 

> I think the danger with deconvolution is that errors propagate and could skew the data. 

You are right to be cautious.  I check all the deconvolutions by running the result forwards again through a 7-day moving average and checking that this matches the input (see the plots on [1] for example) so if errors exceeded typical rounding errors, the code would flag it.  Further, the lead-in period is always 0 cases/day which gives an unambiguous starting point for the deconvolution - this gets rid of most of the likely problems.  The dynamic range of the data is small, so numerical instability from the finite precision FP maths also isn't a worry.   

Edit: obviously it would be nice if the dashboard just had the raw data instead of “helpfully” filtering it.  I hope whoever prepares briefings for local councils and cabinet have access to the raw data and use a filtering approach that recognises the need for up to date information when managing a crisis...

[1] - https://www.ukclimbing.com/forums/off_belay/friday_night_covid_plotting-728011?v=1#x9339702

Post edited at 20:55
 wintertree 20:44 Sun
In reply to Hardonicus:

> Have you any recent data on excess deaths? I can only find info up to 6th Nov or thereabouts.

I've stopped paying regular attention I'm afraid - it's way open to interpretation however you want right now - the year is so different in so many ways that this bulk measure can't be interpreted in any particular way as long as there's a reasonable lid on the direct covid deaths - where-as in March/April it was an unambiguous sign things were going badly wrong.  

Post edited at 20:44
 RobAJones 21:15 Sun
In reply to Hardonicus:

That report for 6th was released on 17th.  Next one is due on 24th  

In reply to wintertree:

> This is also doing the measurement of the direction of the case rate in a much less noise sensitive way than before and a better measurement of the acceleration which has reduced the blue it seems...

My calcs show overall UK 'live cases' now dropping and the rate of drop increasing. Yesterday -2396, Saturday -1153, Friday -538.  This comes against rises of 1908 (13th),1685,2590,3001,2881,2251 & 903 the previous 7 days. I'm looking forward to seeing how the rate of fall over the next week or so pans out.

I see from the BBC that there's going to be a fair bit of relaxing restrictions over xmas in conjunction of enhanced testing, will be interesting to work out the likely rate of rise and date of consequent potential lockdown.

 Hardonicus 17:10 Mon
In reply to wintertree:

I think you're missing a trick not analysing excess death data. This is 'gold standard' information in some regards although as you say it has a number of input parameters. The naysayers are currently screaming that excess deaths are not rising (although this has started to change over the last 4 weeks or so).

Post edited at 17:10
 wintertree 21:19 Mon
In reply to Toerag:

> My calcs show overall UK 'live cases' now dropping and the rate of drop increasing. 

Yes, it's surprising me just how fast cases appear to be dropping.The halving time for cases appears to be much shorter than the first time around but so much has changed about testing since the first lockdown that I'm wary of reading much in to it for now.   It'll be very interesting to look at the halving time for hospitalisations for lockdowns 1 and 2 - unlike cases this won't be drastically changed by the roll out of pillar 2 testing.

> I see from the BBC that there's going to be a fair bit of relaxing restrictions over xmas in conjunction of enhanced testing, will be interesting to work out the likely rate of rise and date of consequent potential lockdown.

Really hard to say.  

I'd be surprised if cases bottom out at less than 10,000 per day after lockdown ends.

  • If we get down to that kind of level and go straight from lockdown to a stricter Tier system, we just might be okay; that could be enough to keep the growth in cases low enough that the gradual improvements to the test/trace/isolate system and the use of mass testing in hotspots could be enough to keep pairing infections back and keep us out of another lockdown until we pass out of Winter.  Although there's going to be household mixing at Christmas it displaces workplace meeting and schools, so there are some balancing factors.   If vaccinations can start in December this further slows the growth and relaxes the healthcare pressure from Covid.
  • Even if cases bottom out higher than that, I think it's pretty compelling that Tier 3 was working to drive R<1 regionally, so if the government are more willing to enact the higher tiers sooner we could be spared any more lockdowns.   I think maybe they're gradually learning that taking softer measures sooner spares harder measures later... ?
 wintertree 21:21 Mon
In reply to Hardonicus:

Maybe.  Don't know what I'd do with the data though - there's no obvious questions to ask of it, no obvious ways to eek out more information by plotting it differently or combining it with anything.  It is what it is, and it just lacks to much context to be useful right now.

> The naysayers are currently screaming that excess deaths are not rising (although this has started to change over the last 4 weeks or so).

They may well go back to being low soon, with the flu season apparently stalled before it gets started.  Wont stop the loons from ignoring that they're low only because of the covid control measures... 

 Si dH 21:29 Mon
In reply to wintertree:

I think the plan explained today is actually pretty sensible and cautious and will keep infections down outside of tier 1 areas. Tier 2 is learning from what Tier 3 did before, which mostly worked. Tier 3 will be stricter still. And they said more areas will be in the higher tiers than before. If they implement it properly* I'd be very surprised if infections rise significantly again. There'll be a bump at Christmas but if it's 3-4 days it won't change the long term course.

*The key here will be whether they can react to local data fast enough. We have seen recently how quickly a city in tier 1 can suddenly become a hotspot (Hull, Bristol, for example.) I'm not convinced the arrangements are in place to react fast enough to prevent that happening again. However, the impact should remain localised.

 wintertree 21:41 Mon
In reply to Si dH:

Agreed; stricter tiers, more readiness to use higher levels etc - it might hold this time round.  On the caution side, we're still going in to winter.

For it to work this time, it does need for local authorities not to resist T2/T3 classifications; last time round we had local politicians here taking the line "We need more time to see if T2 works" - the point being, by the time they found out it hadn't worked, enough future hospitalisation were locked in that the only remaining option was lockdown (Tier 4 in all but name...)I think.

Part of the problem last time I think was having less financial support for the hospitality industry under T3.

I agree on the need for rapid reaction - frustratingly there's not much sign of the various lags in the cases data decreasing; and from what I can tell that lag corresponds to the lag in entry of the data to the test and trace system as well as all the reporting outputs that go to geographic breakdowns.  I really hope the UTLA and MSOA data is analysed by local authorities without the brain-dead 7-day moving average as this further compounds all the reporting lag and means you're looking ~8 days behind reality in total.  

 Hardonicus 00:26 Tue
In reply to wintertree:

Any thoughts on the LFT rollout. Some aspects of the test appear as shady as the PCR.

In reply to Hardonicus:

>  The naysayers are currently screaming that excess deaths are not rising (although this has started to change over the last 4 weeks or so).

They've been comparing the current excess deaths with 5 year average excess deaths values seen on graphs, however what they're not taking into account is the number of actual non-covid excess deaths (probably because no-one seems to be publishing this info).  I believe non-covid excess deaths will be well below average - warm weather and less flu about due to covid restrictions. Thus the excess deaths total isn't particularly high due to fewer non-covid excess deaths than normal. Plus the delay in deaths resulting from infections - the infection curve didn't really get going until october, and deaths resulting from those infections are only really getting going now.

 Si dH 07:57 Tue
In reply to Hardonicus:

> Any thoughts on the LFT rollout. Some aspects of the test appear as shady as the PCR.

I'll say a couple of things on this. Being relatively local I followed the Liverpool trial information quite closely.

There were a few detractors of the Liverpool programme before it started. As far as I could not tell these were not entirely rational, which was unexpected given who most of the writers were, so I think they were borne from either ignorance of the specific programme or having an axe to grind. The majority of the criticism came down to (1) test accuracy and population consent weren't good enough for a screening programme and (2) there are too many false positives for the test to be used in a population with low prevalence. However, in practice (1) doesn't matter if your aim is simply to find and take out as many asymptomatics as you can to reduce transmission rather than trying to screen everyone in order to confirm there are no positives around (clearly not the case at the moment) and (2) isn't a problem as long as you confirm all the positives with a follow-up PCR, which is what is done in Liverpool.

It's not yet clear whether the above will apply future applications or not. If some places start using LFT without a confirmatory PCR for positive cases then you might get as many people isolating unnecessarily as you do actual positive cases in areas with low prevalence. I would say that confirmatory PCR should be considered an essential element of the programme but I don't know if it is. (Edit to say: from something I read yesterday, I understand that the student Christmas testing programme will include confirmatory PCR for positive cases.)

The data from Liverpool has been spun by the Government to make the trial look more effective than it has been and therefore support their policy. They are claiming the reduction of infection is down to the testing, whereas in practice the vast majority of the fall came before it started and it had no discernible impact on the trend. The trend in Liverpool has closely followed that in other adjacent areas (Sefton, Knowsley.)  Having said that, finding and isolating 2000 positive cases can be no bad thing and outside a lockdown the effect would probably have been greater - but there is no evidence. In areas in the new tier 3 I suspect the effect will continue to be quite marginal. Their rates will fall anyway.

The local director of public health made an interesting point yesterday. Apparently take-up in affluent areas of Liverpool has been very high - some over 50% of population - but in most deprived areas has been low. That's where they believe there is more infection so they are going to move the testing locations and refocus their campaign specifically to target more deprived areas from December. This is something other cities will need to consider too.

One criticism of the Liverpool programme would be that in the first few days there were big queues outside test centres. This isn't a very good thing if you are trying to keep people apart. Other areas should probably phase their programmes a bit by ward or something so that they don't get all the enthusiastic people turning up in the first two days.

Final lesson from the trial for now, the data published mixed up the results of LFT and asymptomatic PCR with the positive case numbers in the test and trace output and the covid dashboard. To my knowledge there is no way of distinguishing the positive cases from asymptomatic PCR from those from the usual symptomatic PCR testing route. This has two impacts: (1) you lose the opportunity to compare LFT positivity against PCR positivity for a similar (asymptomatic) population and therefore get a better idea how well the LFT is working, (2) if areas with mass testing include asymptomatics in their published infection rate figures then their numbers will be increased slightly vs those without mass testing, particularly if they pick up lots of people in areas of higher prevalence. This might make it harder to make local policy decisions about restrictions. Of course, all the above data might be available and able to be used by people who are important, but it isn't there publicly,  so we don't know. I tend towards scepticism because of how poorly run test and trace seems to be.

I really like the idea of using LFT to remove the need for close contact isolation if they can get it to work practicably without people needing to travel much. Potentially it could mean less people are discouraged from having a test by the prospect of their families having to isolate, which can't be a bad thing. And obviously there is an economic benefit of fewer people isolating.

Sorry, I'm off the topic of Wintertree's graphs!

Post edited at 08:10
 RobAJones 09:25 Tue
In reply to Hardonicus:

I found this graph interesting, the "spike" in deaths this year will be similar to Spanish flu/WW1  and the start WW2, rather than a "bad" flu year.


Post edited at 09:39
In reply to RobAJones:

Seems to me that there was a lot of sex going on after both world wars 😁

 Offwidth 12:52 Tue
In reply to RobAJones:

Interesting stuff despite only providing data to 2014.

Two separate points caught my eye, both based on increasing trends to 2014:

27% of births had mothers born outside the UK

47.5% births were outside a marriage or a civil partnership.

 wintertree 14:11 Tue
In reply to Hardonicus:

I think Si dH gives a far better analysis of the situation around the LFTs than I can.

My only additions are...

  1. To say that I agree - if LFT +ve results are followed by an RT-PCR test to confirm infectious status, they are an additional, useful tool to controlling the virus in areas of high and rising prevalence.
  2. That I am far from convinced either LFTs or RT-PCRs are a suitable means to "clear" the student body to return home.  In both cases they will normally report as negative someone who has just been infected and is growing up the virus and will go on to become infectious.  Ideally, such testing would follow a period of strict isolation for 4-5 days (so any infection develops to the point it is most likely to give a +VE test result), but many halls of residence are not set up to support this level of isolation.  If the test results are followed up by rapid contact tracing to find and isolate contacts of those who are detected +VE (i.e. people who are up stream of them in the chain of infection), earlier stage potentially infectious but -VE testing people can be identified, isolated for another 5 days and be re-tested before being cleared for going home.  None of this is mandatory however, and not all universities want to play along with the testing and "evacuation" timescale either.  I think that the moment lockdown ends is the best time  - every day after that means a higher prevalence of students with early stage infections who will test negative, travel home and go on to become infectious.   Whatever happens, it needs to be made astoundingly clear to the students and their parents what a -VE LFT result actually means, which is "not likely to be infectious at this point in time" - it does not mean "Covid free".
Post edited at 14:14
In reply to wintertree:

Spot on.  The evidence for a day 5-7 test is out there -  Jersey have been doing 'test on arrival' since the summer and now have community seeding due to negative tests on arrival becoming infectious later on. I think a test on day 5-7 will detect about 85% of cases, rising to 95%+ at day 10 or so.  Any government planning on using anything other than a day 5-7 test is not going to contain the spread.

 RobAJones 19:12 Tue
In reply to Offwidth:

> 47.5% births were outside a marriage or a civil partnership.

The last parents evening I did, only 5 of the 28 appointments were with Mr&Mrs "Child's Surname"  I'm not sure many of those marriages last 12 years.


Please Register as a New User in order to reply to this topic.