UKC

UKC Averages - Why use the mean?

New Topic
This topic has been archived, and won't accept reply postings.
 GGD 03 Aug 2020

Looking at my own averages it seems like they're calculated as a mean. Am I mistaken?

Seems like an odd choice if so, surely a mode would be more informative?

Thoughts, comments, explanations would all be valued, having a slow evening.

Post edited at 20:41
 d_b 03 Aug 2020
In reply to GGD:

I can't speak for UKC but I can make a guess from a programmers perspective.

To calculate the mean you just need to keep track of two numbers - the total "grade" of all the climbs, and the climb count.  Divide one by the other and you have the mean.

For the median you need to get the full list of climbs and perform a partial sort.  This isn't difficult, but you need to retrieve all your climbs from the database to do the calculation instead of 2 numbers that can go in your profile.

For the mode you need to build a histogram of all your climbs and pick the grade that has most entries.  About the same amount of work as median, but there are some weird edge cases you need to make decisions about - e.g. the mode could be both mod and hvs.

If I was going for easy programming and performance I would use mean, if I wanted to offer a bit more I would probably use median, and maybe an actual graphical histogram rather than simple mode if I wanted to get fancy.

 apwebber 04 Aug 2020
In reply to d_b:

All of the data retrieval is already done on the graphs page. It's a few extra lines of code.

Post edited at 00:58
 planetmarshall 04 Aug 2020
In reply to d_b:

While true, these days all three are pretty trivial computations. It's not like it has to be hand coded in Assembler.

It's perhaps more relevant how useful each estimator is of the "average" grade. To know that you need some idea of the distribution of grades for a given climber, and what you actually want "average grade" to mean.

 gooberman-hill 04 Aug 2020
In reply to d_b:

But there are good estimators of the median, which require O(1) space ( I use them in embedded sensors where memory is very limited).

Steve

 d_b 04 Aug 2020
In reply to planetmarshall:

Once you have the data the calculations are really fast, it's the database access that kills you.  This set of numbers is a little bit out of date but gives an idea of just how bad hitting the disk or network is if you want to keep things moving:

https://gist.github.com/hellerbarde/2843375

For people who don't think in nanoseconds there's a "humanised" version if you scroll down a bit.

Of course UKC is built on a database so in a sense that horse has gone

On the question of "what we want average grades to mean", yeah.  A specification problem!

Personally I think mean is the wrong choice because the average you get depends on exactly how the grades have been subdivided and had numbers assigned.  A system that has D,VD,HVD,MS,S will not give the same answers as one that has D,VD,HVD,S etc.  A median is less sensitive to such things.

In reply to apwebber:

Yeah you could do it when you generate the graphs easily enough.  My point was that running means are really, really easy to update as you add climbs to the database.  Comments about the natural laziness of software developers would have been both ungracious and a clear case of projection on my part.

Post edited at 09:14
 d_b 04 Aug 2020
In reply to gooberman-hill:

Which estimators would you recommend for noisy data?  I have an immediate use for such a thing.

I used to work with a guy who advocated median of 3 medians of 3 but I never found it very impressive.

 Andy Reeve 04 Aug 2020
In reply to GGD:

A particularly odd artefact of the use of the mean is to do with the inclusion of the XS grade in three same dataset (even though it doesn't sit in series with E grades). It seems to have been placed between E3 and E4, so on some of my graphs it shows an average grade for a particular year of XS (despite having climbed no choss whatsoever!)

In reply to GGD:

We base it off a scoring system for each grade (see Trad grades below). When the mean is calculated, the closest grade is shown. This is why Andy is getting XS on his stats. We do make the grade voting skip over some of the odd grades like MS, MVS and XS unless that's the actual grade assigned to the climb. We could probably do the same on the stats page but I'll leave that to when that page gets an overhaul. We could do a lot more with that page.

Post edited at 09:46

 Andy Reeve 04 Aug 2020
In reply to Paul Phillips - UKC and UKH:

Cheers Paul.

In reply to Andy Reeve:

Let's face it though Reeve, if anyone deserves an average grade should be XS it'd be you...or possibly Dave Brown...or Dave Thomas...

Either way, wear it like a badge of honour (even if it is just a coding error)  

 wbo2 04 Aug 2020
In reply to GGD: Does the standard deviation tell you anything about a climber? Low mean + high SD  = person who does the odd spicy route?

 tomrainbow 04 Aug 2020
In reply to wbo2:

...or a person who does lots of warming up before tackling their one objective for the day?

...or a person who focuses on redpointing rather than onsighting?

It seems to me that for it to be a meaningful measure it would be useful to be able to find the average for the top 'x'' ascents ( the user would be able to specify 'x') for a given time scale 'y' because people who log everything will have their average (whichever measure of average is used) affected by the warm ups/routes with their kids etc.

Of course you could say that they just don't have to log these routes but lots of people seem to want to log everything they do...and then you get an average which doesn't really inform you of anything useful at all!

Post edited at 10:46
 Andy Reeve 04 Aug 2020
In reply to Rob Greenwood - UKClimbing:

I don't know why I have this reputation, it's all Dave Brown's fault!

 gooberman-hill 04 Aug 2020
In reply to d_b:

I've found medians to be much less noise-prone than means for a variety of applications, as they are less prone to noisy outliers.

Here is a link to the algorithm we are using for stream-based calculation of medians in low power / low memory  IoT sensors:

https://gist.github.com/thomasdarimont/fff68191d45a001b2d84

Steve

 Dave Garnett 04 Aug 2020
In reply to GGD:

> Looking at my own averages it seems like they're calculated as a mean. Am I mistaken?

If it bothers you, you could just delete all the easy stuff! 

1
 d_b 04 Aug 2020
In reply to gooberman-hill:

Thanks.  I think that could be quite useful.

My use case is cleaning up some of the noise in insufficiently converged monte-carlo simulations, to give the user a preview of where things are going.  Mean + variance are already calculated but a really cheap median approximation would be a good thing to add.

 planetmarshall 04 Aug 2020
In reply to wbo2:

> Does the standard deviation tell you anything about a climber? Low mean + high SD  = person who does the odd spicy route?

I suspect that for an individual climber, there is rarely going to be enough data to infer anything useful. In addition you have multiple sampling biases at work - eg a 20 pitch epic in the Alps counts as much as an 8m pitch at Burbage, and I expect many people don't log repeats or failures.

If the data was reliable enough to be useful, then I think it would be the shape of the distribution that was most revealing - eg when someone tells you that they're an E1 climber when in fact they spend most of their time on VS.

 PaulJepson 04 Aug 2020
In reply to planetmarshall:

I find it useful to track progression through the grades. If I climb a similar amount each year, go on similar trips, and build a pyramid of difficulty in a similar way (e.g. loads of easy routes, lots of medium routes, and some harder routes), then I'd hopefully see an increase in average grade. This is more useful to me than the 'max grade', which could be something I seconded or top-roped. 

Obviously I'm a total punter so the curve will flatten sharpish but so far I've enjoyed it. 

Andy Gamisou 07 Aug 2020
In reply to GGD:

Never mind that, if you look at the UKC graphs for all climbers then you see that the average grade climbed for sport routes was 7a+ in 1900 - not bad for the back end of the Victorian era.  

Perhaps even more intriguingly the average sport route grade in the year 1 was 5c, and the hardest route climbed 6a+.  Maybe Jesus wasn't resurrected to show that he was the son of God, but instead to push the boundaries of sport climbing.  Not bad in sandals if so.

 Toerag 07 Aug 2020
In reply to planetmarshall:

> If the data was reliable enough to be useful, then I think it would be the shape of the distribution that was most revealing - eg when someone tells you that they're an E1 climber when in fact they spend most of their time on VS.

It would be interesting to see how many people have 'bulge' grade curves like your example above compared to those with hockeystick curves (everything they climb is at their limit).

 d_b 07 Aug 2020
In reply to Toerag:

Probably fairly common. I have lead e1 but could only ever claim to be a vs/hvs climber. When I was more ambitious/less honest I probably would have claimed e1.

 Brown 07 Aug 2020
In reply to Andy Reeve:

You only need to shag one sheep.....

 apollo18 02 Sep 2020
In reply to GGD:

Off topic but can you contact me as regards some new routes that you've submitted please? 


New Topic
This topic has been archived, and won't accept reply postings.
Loading Notifications...