Total Survey Design

Total Survey Design is a podcast for explaining the complexities of survey design. This podcast serves a diverse audience, including academics, small business owners, nonprofits, industry professionals, and students. Each season features episodes covering topics from survey utility to sample sizes, and question design to total survey error. Episode content includes insightful discussions, expert interviews, and special event coverage to enhance your survey skills and understanding.

All Episodes

Total Survey Design

How to Lie with Surveys

May 30, 2024 • Azdren Coma and Seon Yup Lee • Season 1 • Episode 4

In this episode, hosts Seon Yup Lee and Azdren Coma explore the ways surveys can mislead us, inspired by the book "How to Lie with Maps." They discuss common survey pitfalls, from bad question design to poor sampling, and provide real-world examples, such as misleading questions in political surveys and biased sampling in product ratings. This episode, the first in a series, aims to uncover the frequent errors in surveys that can skew data, whether by accident or intention. Tune in to learn how to spot and avoid these issues in your own survey designs.

Support the show

Find us online at: instagram.com/totalsurveydesign/
https://taplink.cc/totalsurveydesign
Contact us at: totalsurveydesign@gmail.com

SYL: Hello everyone. Welcome back to the Total Survey Design Podcast.

My name is Seon Yup Lee

AC: And I am Azdren Coma.

AC: In this episode, we will talk about how to lie with surveys.

AC: The inspiration for this episode came from reading the book, How to Lie with Maps, which is a book by Mark Monmonier that shows us how maps can be misleading. Similarly, it is important to acknowledge how data collected through surveys can lead us astray. So, in this episode, we will spend some time covering various problems that emerge from surveys. This won’t be an exhaustive list of problems, which is why this is only part 1 of how to lie with surveys, but it is meant to give you a peek into some of the frequent errors we encounter with surveys, which skew the data, either by accident… or on purpose. We will return to this series of episodes in the future because we know there is so much to cover when it comes to this topic.

SYL: There’s an extreme analogy for this situation: garbage in, garbage out. If the surveys are designed poorly or conducted poorly, whatever conclusions are derived from the surveys, they will be flawed – if not totally inaccurate.

AC: Maybe surveys are intentionally designed poorly to lie through them. But it takes a lot of skill to lie with surveys. More likely than not, people lie through surveys not because they intended to lie but because they don’t know any better… One of the biggest problems is that people think that creating a survey is just something we can do intuitively, just like writing an email. And anyone can do it. But there is a lot of science backing the art of survey design… And this is where we come in.

AC: The first way that surveys lie is through the mismatch between the goals of the survey and what it actually measures! Researchers can easily lie through this mismatch simply by cherry-picking which questions to ask or which type of data to collect, often because they know they will get the results that they are looking for.

SYL: A thought-provoking example of bad survey design comes from Jonathan Haidt’s Recently published book, The Anxious Generation. In one of the interviews about the book, he mentions that Mark Zuckerberg, the founder and CEO of Facebook, testified to Congress that using social media apps can have positive mental health benefits for kids. This was during a hearing to determine the impacts of social media on children’s mental health.

Zuckerberg testified based on findings from a study commissioned by Facebook and run by Frances Haugen. She surveyed kids and asked, “How does using social media make you feel?” Kids often say more positive things about social media than negative ones, kids say that using social media often makes them feel good. However, this is different than determining the mental health outcomes of children using social media. Other studies have shown that social media use among children and teenagers is associated with social comparison, body image, anxiety, and other problems that aren’t obvious if we only listen to Zuckerberg’s statements.

What makes someone feel better doesn’t always mean that there are mental health benefits. To the contrary, a lot of things that make us feel better at one moment, may have negative health outcomes. A lot of addicting things in life – like smoking, drugs, gambling, binge eating – you ask a person how it makes them feel while doing the activity, and they will probably say that it makes them feel good. But there is a mismatch between measuring the health impact of an activity, even just the mental health impact, by just asking how one feels while doing the activity.

SYL: Another way a survey can be used to lie is by having a bad questionnaire, either one that is way too long, with many irrelevant questions, or by writing bad questions.

This technique of lying through poor question design is common in the private industry sector and in politics… And we have plenty of examples to share.

AC: In one example of a survey that was shared on an authorized website of the GOP headquarters, they asked, “Are You Against The Biden Administration’s Unconstitutional Attack On Our Freedoms Through The Vaccine Mandate?”

Now, before we get to the response options, let’s examine the numerous issues with the way this question is formulated.

· First of all, the question is leading. By asking, “Are you against something?” you are saying that the expected response should also be against something. An honest survey, on the other hand, would instead say, “Are you for or against something?” in essence communicating to the respondent that both support and opposition are acceptable responses.

· Second, there’s social desirability bias associated with opposing something that is unconstitutional. If the question asks about the respondent's views on vaccine mandates, you don’t have to include your interpretation of whether it is unconstitutional or not; that’s up to the respondent to determine.

· Third, this is a double-barreled question. It asks about an attack on freedom AND on the vaccine mandate itself.

Looking beyond just the formulation of the question itself, the response options are also very problematic.

There are two response options: “Yes, I stand for Freedom” and “No, I support Tyranny.”

Again, this response has many of the problems that we saw with the question itself. The options are leading, and they are double-barreled. I mean, who doesn’t stand for freedom? And who would answer that they support tyranny?! So, this pollster distributes this survey; they are probably using it to collect contact information on potential donors, and at the end of the day, they would have such skewed data that I couldn’t really imagine a legitimate place where these findings would be helpful. Maybe as a speaking point on the campaign trail, maybe at a discussion at some board meeting, but this is just another example of how to lie through surveys.

AC: Another example of how to lie through surveys is through poor sampling design. This issue is more about sampling selection and statistical interpretations.

SYL: As a matter of fact, the first example that I want to share doesn’t even involve a survey per se, but rather a product rating – which, in a way, is a form of a survey in itself, but illustrates the issues and importance of proper sampling.

But imagine that someone you know started selling some clothing that they designed themselves. They set up a store on Amazon, they open social media accounts, and their products have free shipping on orders above $50.

Your friend never misses the chance to make a post on a public holiday to share a link to their store. And on Memorial Day, they decide to make a new post. So, you decide to finally check out their page on Amazon. You contemplate buying a sweater in support of your friends’ business. But you also know that buying clothes online is not easy. There’s no way of knowing the quality of the products without trying them on first. So, what do you do? You check the ratings or reviews of products.

The first thing you see is that it has five stars. “That’s great!” you say. Except, it only has seven reviews. In other words, you know that you cannot trust the five-star rating because only seven people have rated the product. In other words, the sample size is really small. If a clothing product has a couple thousand likes, even if it is rated slightly less, like a 4.7 or a 4.5, you can still be fairly confident that the product is actually good quality.

The second problem beyond just the small sample size is a potential problem with the sample selection. I could recall from a few weeks ago that my friend was asking on his own personal social media page for his friends and family to go onto Amazon and leave good ratings. In other words, he found his sample of reviewers from a convenient source. If those seven reviews only come from either his friends or family, you can imagine that your confidence in the product would be lower than if it was just seven random customer reviews. Even if it was a product by a famous celebrity with thousands of reviews, if those reviews came from a campaign by the fans, knowing that the rating is a result of a collective effort will distort whether you trust those ratings.

Considering the examples that I have just shared, you can see how people can easily lie through surveys, by manipulating the sample selection and sample size. In other words, where respondents are found and how many people are surveyed can seriously alter the outcome of survey results.

SYL: Beyond just not reporting sample sizes and sampling methods, a lot of surveys don’t report the response rate. This is another way that people can hide underlying issues with their surveys.

AC: The final example I want to talk about is how to lie through surveys through bad reporting of survey results.

The response rate most basically means the proportion of people who responded to the survey. There are variations with how researchers calculate the response rate. For example, if 1000 people received the survey, and only 200 people responded, the response rate is 20%. But some surveys have a response rate of 50% or more. Others have a response rate of only 2%. If a survey has a very low response rate, you might get skeptical about the quality of the survey design. That is why when a surveyor does not report the response rate, they might be lying by hiding a bit about how the sausage is made.

Another way that some researchers can lie through bad reporting, is by not reporting how much compensation they gave to their respondents. There are crowdsourcing websites like Prolific, where a lot of surveys posted on these websites pay something around 15 dollars per hour on average. We will go over the costs of surveys in another episode.

But for this episode, let’s suppose that, if 100 surveys pay on average of 15 dollar per hour, and then there is one survey that pays 3 dollar per hour, you could imagine that the data from that survey that only pays 3 dollar is possibly terrible. You could imagine that those people who completed the survey that pays 3 dollar per hour might be substantially different from the rest of the population. They might have been either more desperate for that 3 dollar per hour, or might be responding to the questions without much thought, because of the low pay.

What’s worse is that if the main thing that the survey is measuring is related to income or money, then there is a chance that the data will be particularly skewed. And unfortunately, a lot of published papers do not report how much they are paying their respondents.

Finally, there is something called p-hacking. What this p-hacking essentially means is that you can take the data that you have collected, and you can run different statistical tests on it, including adding or removing different variables and cleaning the data in different ways until you get the results that you are looking for. Sometimes, you have a continuous variable, and you turn it into a categorical variable, and your results suddenly show statistical significance. But not saying that you did this data transformation is a form of dishonest reporting of your survey results.

In the end, we are saying that being suspicious about survey findings and data is a healthy aspect of understanding survey design and how people can lie through surveys.

SYL: If you have any questions, comments, or suggestions for future episodes, contact us at totalsurveydesign@gmail.com. Totalsurveydesign is all one word.

This episode was recorded at the Dimensions Lab, at the Washington State University Library. We would like to acknowledge the help of Thom Allen at the SESRC. Also, a thank you to Don Coma. The music for this episode is called “Kalimba Relaxation Music” and was created by Kevin MacLeod, and licensed under a creative commons license.