Total Survey Design
Total Survey Design is a podcast for explaining the complexities of survey design. This podcast serves a diverse audience, including small business owners, nonprofits, industry professionals, academics, and students. Each season features 10 to 20 concise episodes, each lasting 5 to 20 minutes, covering topics from survey utility to sample sizes, and question design to total survey error. Episode content includes insightful discussions, expert interviews, and special event coverage to enhance your survey skills and understanding.
Total Survey Design
Sample Design - Part 2. The Sampling Frame
In our second episode focusing on sample design, we dive into the intricacies of creating a sampling frame. We explain what a sampling frame is and why it's crucial for ensuring your survey reaches the right target population. We discuss the four key elements of an ideal sampling frame: comprehensiveness, accuracy, specificity, and accessibility. Additionally, we highlight real-world examples and practical tips for researchers, from using address-based databases to leveraging online survey platforms. The episode concludes with brief phone interviews with listeners, emphasizing the potential biases in online panels and the importance of a well-constructed sampling frame.
And thank you to our listeners and friends who joined us for this episode.
Find us online at: instagram.com/totalsurveydesign/
https://taplink.cc/totalsurveydesign
Contact us at: totalsurveydesign@gmail.com
AC: This episode is our second episode focusing on sample design, specifically focusing on the sampling frame.
AC: Before we begin our episode, we would like to thank our listeners. We have found listeners across the globe, in over 30 countries. We have released 10 episodes until now, and we have made it so far thanks to you all.
And thank you to our listeners and friends who have been sending some good feedback and suggestions. Towards the end of this episode, we have a segment of brief phone interviews with some of our listeners.
We’re constantly trying to make Total Survey Design podcast as informative AND enjoyable as possible, and we appreciate your continued support, listens, and shares.
AC: One of the most challenging tasks for someone conducting a survey is to figure out their sampling frame.
The sampling frame is essentially a list or database that contains all the entities – such as individuals – of the target population from which a sample will be drawn. It is an essential component of conducting good research. Without a proper sample frame, there is no way of ensuring that you are reaching your target population, which ultimately means that there is not a lot to say that your survey findings are representative of those whom you are trying to survey.
SYL: So, how do you determine a sampling frame?
The best approach would be to begin by clearly identifying your target population. Once you know who your target population is, then you can figure out an appropriate sampling frame. We have plans to record another episode specifically dedicated to identifying your target population. But for now, let’s focus on the sampling frame.
An ideal sampling frame has four elements.
One, it is comprehensive. It means to include everyone in the target population without leaving out any parts of the population, whether intentionally or not. In other words, your survey’s sample frame should include everyone in the target population and exclude those who are not.
If a survey has a target population of restaurant workers, the sample framing should not include everyone in the city, everyone on Prolific, or even everyone in the restaurant. The sampling frame should potentially include only the target population of restaurant workers. This might be a roster of workers from a restaurant that you want to survey.
AC: Second, an ideal sampling frame is accurate, where the information in the sampling frame is correct and up to date.
In the example above, if the sampling frame of restaurant workers was found from an old roster of workers, it might include people who are not working at the restaurant anymore and exclude potentially new workers. In this case, distributing the survey using an inaccurate sampling frame might harm your data because it is reaching the wrong people. If the purpose of the survey was to figure out what else management can do to improve worker conditions, a former worker might be motivated to give inaccurate information. And new workers with fresh eyes on working conditions will be left out.
SYL: Third, an ideal sampling frame is as specific to the target population as possible.
With the example above, if you are interested in gathering feedback just for a specific group of workers, like wait staff, kitchen crew, management, or cleaning staff. If the roster of workers at the restaurant includes people who shouldn’t be surveyed, then the sampling frame needs to be refined to target just those who you need to survey.
AC: Fourth, you actually need to have access to the sampling frame. This is a consideration that, unfortunately, is the barrier to a lot of great research. Even if you have found the perfect sampling frame, it does not really help if you do not have permission to access the data. So, it would help if you considered a sampling frame that is readily accessible to you.
From the example above, if you are surveying restaurant workers and have access to the worker roster because it is a survey commissioned by the restaurant itself, then access is not an issue. However, if you are studying restaurant workers as an outsider, you might not have access to the roster, and your sampling frame would need to be reconsidered.
SYL: Selecting the appropriate sampling frame is difficult because sampling frames are as diverse as the target populations being surveyed.
There is no systemic approach to finding your sampling frame. Each target population has unique characteristics and requirements, which means that what works as a sampling frame for one survey will most likely not be suitable for another.
For instance, a sampling frame for a survey on healthcare practitioners might be from a list of registered healthcare professionals.
On the other hand, a survey on consumer preferences could use a list of recent customers. Because target populations are so diverse, we need to take a careful and tailored approach to select a sampling frame that is most appropriate and that most accurately represents the entire population of interest.
AC: Let’s look at some examples of target populations and their respective sampling frames.
AC: A lot of surveys intend to generalize their findings to the entire US population. That is, the studies are seeking findings that they can say would be applicable to the whole US population.
Unfortunately, we do not have a sampling frame of the entire population of the United States. At least, nothing that is readily and openly accessible. The Social Security Administration has social security numbers for every US resident, but that information is not accessible to the public. We also don’t have a publicly accessible database of telephone numbers for the entire US population. But even if we had like a Yellow Pages for the entire United States population, the Yellow Pages do not include cell phones, and we know that there are many people who do not have landline telephones.
That is why we need to look at other feasible sources for our sampling frame of the US population.
SYL: It turns out that the US Postal Service actually does have a database of every address in the US. It is somewhat of a public information already, as you can find on Google Maps or by just driving around, only if you have enough time for that. Some researchers, or research companies use this database of addresses to reach a random sample of the US population, or even a targeted sample in a specific geographic location.
We do know that there are some marketing agencies that use this database to spam everyone with advertisements. Still, access to this address-based database is invaluable to researchers who want to do a generalizable study.
Regardless, there are many problems associated with using this address-based approach for your sampling frame. First of all, there are millions of addresses, and it is really difficult to manage all the database. People move homes. New homes are built. Homes are abandoned or burnt down. Because of these reasons and many more, it is hard to make sure if the database is accurate and up-to-date.
Also, some addresses are for businesses, some are private residences, some are for entire apartment buildings, whereas others are for single family homes. In other words, even if you had all the addresses, the data is not ready to use, and there is no way of ensuring that the entire population can be reached, considering also that there are people who may be living in motels, or sleeping in their cars, or in some other transient arrangement.
AC: Researchers accept that there will be a certain amount of survey error, no matter how complete the database of addresses might be. But, to reduce errors, research companies such as Marketing Systems Group (or MSG) offer a service of collecting the addresses from the USPS, cleaning the data to the best of their ability, and selling you a sample under different conditions.
For example, in a recent project, I was looking to collect a stratified sample of US households divided equally based on rural and urban addresses. I needed 4000 addresses, and it cost me about $700. Maybe someday we can get someone from MSG to come talk about their process of developing a sampling frame. But anyway, in a nutshell, these services can reduce a lot of the work and error that goes into generating a sampling frame.
Then, survey invitations are sent to 4000 households through the mail. But it is still necessary to randomize the person who takes the survey. Otherwise, those who respond to the survey might be overrepresented by homeowners or those so-called heads of the household. So, to reduce this potential bias, you can say something like, "The person with the next upcoming birthday is meant to take this survey." This way, a randomized element is introduced into who takes the survey.
SYL: That was about address-based sampling. Now there are platforms, such as Prolific and YouGov, that claim to have a panel of participants that is representative of the US population. Both platforms work similarly.
The way they work is that they have a panel of internet users who have signed up on their platforms and are willing to take surveys for researchers or pollsters. The researcher pays Prolific for a sample, and participants each get paid a small amount, and Prolific gets a fee proportional to the cost of the study.
According to Prolific, their participants are recruited primarily via word of mouth, including social media. Although, when Prolific started out about ten years ago, their users were mainly recruited through three means: social media, like Facebook and Twitter; fliers distributed on different university campuses; and a referral scheme, which was discontinued about five years ago.
On these platforms, researchers can select their sample of different groups by filtering different demographic criteria, such as by country, gender, political identity, or something like whether they play more than 3 hours of video games per week, of which Prolific claims they have a pool of nearly 34,000 users. Most importantly, Prolific offers a service to provide you with a representative sample of the US population based on data from the US Census.
SYL: But, there are some potential issues with using a platform like Prolific. There might be a difference between Prolific users and non-users. So, even if you have a representative sample, it might not truly be representative.
For example, Prolific users might also develop familiarity with taking surveys. When people become familiar with surveys and kind of learn how researchers ask questions, we also call it losing naivety as participants.
Also, even if the respondents that you’ve found through these platforms might demographically look like the US population, they might be meaningfully different from those who are not users of these platforms. While the matching process attempts to create a sample that mimics a random sample by using a large set of variables, the variables are only as extensive as those collected by the US Census, and this approach assumes that these variables capture all relevant differences between the panel and the general population, which is a strong assumption that may not hold in practice. The ignorable, smoothness and common support assumptions underlying the matching process are critical and can be easily violated, leading to biased estimates.
The use of an opt-in panel introduces self-selection bias, as those who choose to participate may differ systematically from those who do not, and therefore the sample might actually not be as representative of the population as you would hope. Recruiting online panels for your surveys could also introduce coverage bias, by excluding individuals such as those without reliable internet access or those who are less comfortable with online surveys.
Some Reddit users have complained about the process for accessing studies on Prolific. Reddit user QueenMackeral, in a recent post, complained about how there are sometimes a limited number of studies, and the recruiting process can appear unfair and frustrating for the users. Another user Dependent-Emu6395 also made the point that it is possible that those with lower internet speeds, higher pings, slower reaction times, or slower browsers might be disadvantaged with their access to studies.
AC: Related to the issue of these online panels, we decided to include the voices of our listeners. We sent out a call to action on our social media pages and asked people who are willing to answer a couple of quick interview questions. Our goal is to just highlight some of the differences between the general population and a potential online panel.
AC: "Hey, thanks for agreeing to be a part of our podcast...
"So, there's this website called Prolific, where people can sign up to take surveys. The surveys pop up occasionally, and they usually take a few minutes each and usually pay about a dollar or two for an hourly rate that is roughly $15/hour on average for the surveys that are completed.
"Here is my question to you. Would you, or would you not be willing to sign up for this website?"
"Please share why you gave the answer that you did."
"Also, would you please tell us a bit about what you do for a living?
"Thank you for your help!"
[phone interviews inserted here]
AC: The point of these brief phone interviews was to get you to think about the self-selection bias, or in other words, whether or not there might be a meaningful difference between people who are willing to participate on online survey panels like Prolific and those who are not. And if there is a significant difference, you might want to consider if the online panel is actually representative of the general population.
SYL: If you are a researcher, you should be cautious when substituting these online panels for your sampling frame. If you are not a researcher, when you read about findings from research that comes from these online panels, you should take them with a grain of salt.
SYL: Continuing our conversations about sampling frames. Let's switch gears to another target population. One target population could be all the graduate students within an academic department of a particular university. If this was a study of my colleagues in my own department, this population would have a simple and easily accessible sampling frame, as the mailboxes of all students could be easily found in the mailroom. And even if you want to reach someone who does not have physical access to the building, I can find everyone's email addresses on our publicly available sociology department website.
If my target population is the students in my class, that is another easy sampling frame because I can simply email all my students. But if my target population is all the students in my university, the sampling frame exists, but it is protected by a lot of red tape. It is possible, but I would need to get the permission of some administrators to send an email that would go out to all students to recruit them for a study. In fact, my colleagues and I wanted to do a survey of all international students in the university, but when I reached out to the International Students office at the university for their contact list, they told me it would not be easy for me to get the contact list.
AC: This spring, I launched a survey pro bono for a friend of mine who owns a small business. The target population was current customers and potential customers. There was no easily accessible sampling frame for all of his clients and potential customers, so we had to improvise. We included invitations to a survey on a flier that was placed in the bag of every order for a month. He also posted fliers in various public places throughout the city, such as at the local supermarket, the local library, the town hall, and street polls. He also placed 500 fliers on cars that were parked throughout the small city. The goal was to include everyone who might be a current or potential customer.
In conclusion, whether your target population is small or involves millions of people, for you to generalize your data to your target population, it is crucial that you define the sampling frame accurately and appropriately. We hope this episode was helpful in giving you tips and guidelines about how to define your sampling frame.