How YouGov's MRP model works for the 2024 U.S. presidential and congressional elections

Delia BaileySenior Vice President of Data Science, Innovations
Douglas RiversChief Scientist
September 24, 2024, 9:24 PM GMT+0

A guide to the methodology behind YouGov's use of multilevel regression with post-stratification (MRP) to project how Americans will vote in the 2024 election. Last update: November 1, 2024

What method are you using to estimate votes?

Our approach to estimating the vote is based upon a multilevel regression with post-stratification (MRP) model. We have used this approach successfully in past elections in the U.S. — including in the 2020, 2018, and 2016 elections — and elsewhere. It uses a statistical model to predict votes for everyone in the national voter file, whether or not they belong to YouGov’s panel. Interviews with our panelists are used to train a model that classifies people as likely to vote for a particular candidate (or to not vote) and then this model is applied to the entire voter file. We then aggregate these predictions — in what is referred to as post-stratification — to estimate votes for all registered voters. The model has three stages: (i) estimate the likelihood of voting; (ii) conditional upon voting, what is the probability of voting for a major-party or third-party candidate; and finally (iii) predict support for each candidate.

Who are you surveying?

The model is based on interviews with U.S. adults from YouGov’s U.S. panel as part of the SAY24 project, a collaboration between Stanford, Arizona State, and Yale Universities. We have been interviewing most of these panelists on a regular basis since December 2023 with periodic re-interviews — either monthly or quarterly, depending upon the panelist. Panelists have been asked about their likelihood of voting and who they intend to vote for, along with a host of other questions.

How many people are you surveying?

A much larger sample than is normally used for opinion polling, which allows us to make estimates for each state and congressional district. The first set of estimates of presidential and congressional voting were based on nearly 100,000 interviews. The second set of estimates is based partly on interviews from the first set of estimates in August and September, while updating the models with responses from more than 20,000 registered voters who were re-interviewed in late September and early October. The third set of estimates is based on interviews with 57,784 registered voters between October 25-31, 2024.

Are you interviewing people only once?

No: A unique feature of this study is that we can track the same people over time and see how they shift — or don’t shift — as the campaign progresses. Unlike most polls, which draw a new sample each time a survey is conducted, we can distinguish voters switching between candidates and not voting and, on the other hand, variations due to changes in sample composition. So far, in 2024, we have seen striking stability in voters’ candidate preferences.

What should people keep in mind when reading MRP-based estimates?

We caution that these models are based upon what people tell us they plan to do. A small share of registered voters say they are undecided, but the majority tell us their minds are made up. However, people can change their minds and, if they do, we should see these changes reflected in our model updates. These results reflect our best estimate of the current state of the race — they are not predictions. And the estimates have uncertainty, as does any measurement using survey data.

What outside data are you using to improve accuracy?

One important feature of this dataset is that we have matched the participants to TargetSmart’s national voter file. This means that, except for people who tell us they intend to register between now and the election, we have a sample of verified registered voters. We have also linked voters to tabulated votes in the precinct where they are registered. This helps ensure that the sample is representative of different geographies, some of which have been underrepresented in prior years.

Why is it important to use statistical modeling to supplement interviews?

Even with a large sample, we are still short of data in some key areas and hard-to-reach demographics. In Maine and Nebraska, for example, electoral votes are allocated by congressional district, and we have to estimate votes for each. In small states, we have correspondingly small samples. The same goes for congressional districts. And in large states, some groups — such as younger voters and rural voters — can be in short supply.

What is the schedule for release of voting estimates based on the MRP model?

Our first releases for the presidential and congressional elections were in September. We updated our electoral-vote model and our congressional model in mid-October, based upon more than 20,000 interviews that we conducted between September 18 and October 15. We then interviewed more than 55,000 panelists in late October for the November 1 release of our final forecast prior to the election.

Who made the model?

The model has been designed and tested by members of the YouGov political team in the U.S., headed by Doug Rivers and Delia Bailey. MRP is an invention of Andrew Gelman, who is known around our office as Mister P.

— Carl Bialik and David Montgomery contributed to this article

Related:

Image: Getty

What do you think about the election, American politics in general, and everything else? Have your say, join the YouGov panel, and get paid to share your thoughts. Sign up here