UKC Forums - Manipulating Probability Distribtions

Manipulating Probability Distribtions

This topic has been archived, and won't accept reply postings.

James Malloch 26 Aug 2016

Suppose I have a cumulative probability distribution which has probability on the y-axis and a continuous variable on the x-axis.

The distribution has boundaries on the x-axis (max and min) which are, for the purposes of this, say 0 and 2. I.e. there is a distribution that has 0% of observations at 0, and every observation is less than 2. Let the median value occurring at 50% be 0.6.

So 50% of values are between 0 and 0.6. 50% of values are between 0.6 and 2.

Now, suppose I want a new median value. Say, 0.9. And I want to get this by manipulating the existing distribution so that 50% of values are now between 0 and 0.9. 50% of values are between 0.9 and 2.

I can attempt this this by:

1) Linearly shifting the distribution to the right. I.e. keep the same shape and move the distribution 0.3 to the right. This gives the correct median, but this makes my new range 0.3 – 2.3, which I don’t want.
2) Multiplicatively stretching the distribution to the right. I.e. multiply each x-axis value by 1.5. This gives the correct median but would make the range 0-3.
3) Some kind of normalisation, maybe?

So – my question is, does anyone know a way to move the median, without affecting the boundaries, and still maintaining a shape similar to the original distribution?

My thoughts are along the line of either doing something as a function of the median, so that vales closer to the median are affected more, doing something along the lines of normalisation before adjusting it, or maybe moving/squashing it in the vertical direction, possibly in log odds space.

I hope this is clear(ish) and I imagine there will be many ways to do it. But I dropped my stats modules at uni and it’s now come back to bit me in the ass.

Thanks
James

cb294 26 Aug 2016

In reply to James Malloch:
Like in real life normalization does not allow you to have your cake and eat it!

Fix your boundaries and apply some renormalization function that changes every single value such that the median falls elsewhere necessarily changes the shape of your distribution.

Conversely, any function shifting the median while fixing skewness and kurtosis must shift the boundaries.

Even faking additional data points won£t help....

CB

Post edited at 09:58

timjones 26 Aug 2016

In reply to James Malloch:

I wonder why mere mortals are so sceptical about statistics

OP James Malloch 26 Aug 2016

In reply to cb294:

> Fix your boundaries and apply some renormalization function that changes every single value such that the median falls elsewhere necessarily changes the shape of your distribution.

> Conversely, any function shifting the median while fixing skewness and kurtosis must shift the boundaries.

> Even faking additional data points won£t help....

I'm happy for the shape to change, but would like it to remain similar.

Without a formula, this would be possible. I.e. if I could firmly hold the end points, and then bash the line in-between about a little bit so it followed the vaguely same shape, but was less steep in the first half, and steeper in the second half. Then model the line et voila. New distribution with the correct median, that looks similar to the original.

That's kind of faking new points based on the old point, essentially.

I just wondered if there was a mathematical way to do the bashing.

OP James Malloch 26 Aug 2016

In reply to timjones:

> I wonder why mere mortals are so sceptical about statistics

Tell me about it - all the reasons I dropped it are coming flooding back!

Ramblin dave 26 Aug 2016

In reply to James Malloch:

Could you apply separate scaling functions to the two halves of the x axis either side of the new median, so that the area x < 0.9 gets squashed into x < 0.6 and the area x > 0.9 gets stretched into x > 0.6. The calculate your new cumulative distribution function at x by first working out where x gets moved to and then evaluating the old function at that point.

x -> 2x / 3 (x <= 0.9)
x -> 0.6 + 14 (x - 0.9) / 11 (x > 0.9)
work as mapping functions, although it might be possible to find an alternative that's smooth at 0.9.

OP James Malloch 26 Aug 2016

In reply to Ramblin dave:

I like that approach - nice and simple (which is what I'm hoping for) and does what's required.

I'd have to actually do the calculations as it would mean there's a kink at the median value due to making one side of it steeper, and one side of it less steep. Not sure how this would affect our model results but first thoughts make me think it would be okay.

Many thanks!

Open to other suggestions too, it's a bit of a brainstorming phase at the moment.

cb294 26 Aug 2016

In reply to James Malloch:
Depends on what you mean by "similar", i.e. which properties of your distribution you want conserved. If you just want the number of peaks conserved, you could easily devise a transformation function that shifts each point using a variable scaling factor that continuously drops to 1 for your upper boundary. This would have the effect of "bashing" your curve to the right. I am not sure off the top of my head which properties are conserved and which are lost.

In general, I would recommend avoiding custom normalizations that are not trivial (e.g., centering the peak of a Gaussian distribution on zero) or intrinsically justified by the experiment. Usually it is better to use a statistical test that does not require the distributions to adhere to some property that they do not have straight off the measurement.

Often you can find additional tests that allow you to ignore some property (of course that comes at a price in terms of power). My favourite approach, though, is to use nonparametric methods, especially when the experiment allows easy adjustment of absolute values.

CB

edit: shifts, not shits...

Post edited at 10:36

Paul Baxter 26 Aug 2016

In reply to Ramblin dave:

That's the right way to do it. You can use quadratic functions if you want to add in the requirement that the derivative is smooth at 0.9 as well.

OP James Malloch 26 Aug 2016

In reply to cb294:

Thanks again CB. A variable scaling technique was something I was trying yesterday but however I was doing it didn't work well. I'll give it another attempt later to see it I can rectify it.

The general premise of the test is we have an observed distribution and we will use a forecast which produces a new median. So if the forecast shows a median sufficiently away from the current one then we will essentially be modelling a completely new scenario.

But the scenario still cannot exceed the max/min and we only have the observed data to infer how the new scenario may react.

Jon Read 26 Aug 2016

In reply to James Malloch:

Why do you want to do this transformation of your data?

OP James Malloch 26 Aug 2016

In reply to Jon Read:

See the previous post (posted at the same time as yours).

It's essentially saying, we know what's happened in the past. If we predict future scenarios where the same constraints apply, but the median shifts, what would the weighted mean be?

So if 0 is good and 2 is bad. If we predict things are, on average, worse in the future, how might probabilities of hitting 1.5, or 0.2 change compared to what we have seen in the past.

Philip 26 Aug 2016

In reply to James Malloch:

The simple answer is no. You can't have the same shape and keep the boundary.

If you want to increase the median, and the amount you need to increase by to every data point. That will keep the shape but shift the data to 0.3-2.3.

You could then compress the distribution from a span of 2.0 to a span of 1.7

To do both : take x and do (x-1)*0.85 +1.3

That will give you a similar distribution shape, with median 0.9.