UKC

One for the statistics bods

New Topic
This topic has been archived, and won't accept reply postings.
 edunn 04 May 2014

Apologies in advance for the heavy duty stats on a Sunday night. . . .

I am trying to analyse whether one independent variable is better at explaining the dependant variable than a second independent variable using regression (i.e. by comparing R squared values). I am not using the regression model to predict, simply compare 'fit'. I have also run an autocorrelation.

If my results are heteroskedastic (and therefore no longer BLUE) can I still use the R squared values to compare fit without worrying about correcting for heteroskedasticity?

I have tried all sorts of methods to remove hetero (log IV, SQRT IV, etc etc), but cannot seem to get rid of it.

Any help much appreciated.

Cheers.

(Christ I wish I was stuck miserable and cold on the side of a rock somewhere!)
Post edited at 21:05
 KingStapo 05 May 2014
In reply to edunn:

heteroskedasticity - where do they come up with this stuff?

Have you tried maximum liklihood estimation to fit your trial models to your data?

That's the limit of my stats i'm afraid...
 SteveoS 05 May 2014
In reply to edunn:

ANOVA?

Used to be a function in minitab that'd do that for you.
cb294 05 May 2014
In reply to edunn:

Simple regression analysis will probably be unreliable.

Converting your data into rank data and using a measurement for rank correlation (e.g. Spearmans) or a nonparametric test for repeated measurements (Friedmans´s test) might help.

That´s about it, here endeth my statistics knowledge....

CB
 katharine 05 May 2014
In reply to edunn:

In what way is the data heteroskedastic?
It may be that linear regression isn't appropriate to your data so I'd plot the data (or the residuals) to check that linear regression really is the most appropriate method - that may also give you a clue as to how best to transform your data.
If the data is not badly heteroskedastic I wouldn't worry about it, but if it is then check your model isn't mis-specified - do you need to add any extra variables? Often heteroskedasticity is caused by subsets of the data being different for some reason (e.g. an unreliable group, problematic batch etc.)
If your residuals are heteroskedastic then you can't use R^2 as OLS doesn't minimize variance so comparing residual variance won't help. You could always try Generalized Least Squares regression (GLS). That will always give you BLUE estimates, but it's not a straightforward option.

Does it have to be one variable or the other?
If your two independent variables are not strongly correlated my approach would be to run

DV~IV1+IV2+IV2.IV2

Then do a stepwise regression using robust standard errors to get the best model - it's not a formal test but it will tell you what's most appropriate.


OP edunn 05 May 2014

Thanks all,

Katherine. . .

Data is heteroskedastic in that the variance (residuals) are larger towards the left of the graph, creating a left-weighted funnel type arrangement. I am fairly certain that my data is linear- I am simply plotting population and GDP per capita against number of Olympic medals. I have also tried log-population and log-GDPcap but the pattern still exists.

My thought is that because medal totals are 'grouped'(i.e. lots of countries with one or two medals and only a few with 10+ medals), then this accentuates the variance?

I would say that my variables are fairly strongly correlated (GDPcap & population?), however Generalised OLS might be an option.

Could you explain why you use IV1 + IV2 + IV2.IV2 ? I don't understand the 'IV2.IV2' term. What does this add?

Thanks again.

 psaunders 05 May 2014
In reply to edunn:

I expect katharine meant IV1.IV2, an interaction term.
OP edunn 06 May 2014
In reply to edunn:

Just in case anyone's interested, the problem is that I'm using cross-sectional data, which often displays heteroskedastic properties. The way round it is to use generalized least squares (I was using ordinary least squares).

Simple really!

Thanks all.

New Topic
This topic has been archived, and won't accept reply postings.
Loading Notifications...