Monday, 6 July 2015

st.statistics - Combining regressions

I have two variables, $x$ and $y$, and I am using linear regression to predict $y$ from $x$ over a large set of subjects. There are multiple observations per subject.



I have tried several things:

  1. pool all observations together and fit one model to the entire dataset;
  2. fit separate models (same specification as (1)) for each subject.

Now, (1) works reasonably well but obviously takes no account of subject-specific effects. On the other hand (2) is prone to overfitting, which manifests itself in poor out-of-sample performance.



If I average the two predictions, the result performs better than both (1) and (2) out of sample. However, this is clearly somewhat ad hoc.



My question is: what might be a better way to combine (1) and (2) into a single predictor?



Also, I have reasons to think that "similar" subjects should have similar regression coefficients. Is there any way to make use of this?



[Edit] I should have mentioned that there is no natural hierarchy to the subjects. However, I think I can come up with a reasonable similarity metric.

No comments:

Post a Comment