I think your Garmin 310 + metabolic/VO2 max testing is about as accurate as it gets for this fickle science. Garmin 910 has a slightly better algorithm, I think, but probably minor difference for a big investment.
I think the best you get is maybe +/- 10% errors with any method.
I had Garmin 405 (one of the early units) previously, and touch bezel was a little wacky in the rain or if heavily dripping sweat (like 4+ hour run sweaty), but got better with a software update. Later model 405's and the 410 fixed it slightly more. Not their best feature choice.
I suspect there's a new Garmin on the horizon, and the 405/410 are probably on their way out.
It seems that the software used to interpret the data recorded by your HRM is probably more important than the HRM used to record the data. Unfortunately my suggestion requires both new hardware and software. I guess what is needed is dependent on your goals.