The origin of data used by artificial intelligence (AI) companies to train their models is murky and often controversial. A recent YouGov poll found that some Americans are concerned that their data and copyrighted material is being used to train AI models without their permission. 28% of Americans believe that AI companies rarely or never ask permission to use someone’s material. Slightly fewer Americans — 22% — say that AI companies always or usually ask for permission.
Slightly more adults under 30 than Americans 65 or older believe that AI companies always get permission to train using copyrighted content (13% vs. 4%).
Data scraping from the internet, a popular method for getting training data, is not popular with Americans. Americans are twice as likely to believe that AI models should not be able to be trained using internet data as to think they should be (44% vs. 20%).These opinions cross the aisle: 43% of Democrats and 48% of Republicans say that AI models should not be allowed to train their models on the open internet; 24% of Democrats and 21% of Republicans disagree.
This use of copyrighted data for AI training has been controversial among artists and creators. Regarding lawsuits filed by creators against AI companies for illegally or improperly using their works, Americans are more likely to think the creators will win than to think they will lose (40% vs. 18%). Two in five Americans either aren’t sure (28%) or think both parties are equally likely to win (14%).
Americans who have college degrees are more certain about which way the suits would go than are those without a college degree, but among both groups the copyright owners are more likely to be expected to win.
Related:
- Americans' top feeling about AI: caution
- How Americans feel about AI’s role in their careers and in K-12 schooling
- Which technologies do Americans associate with AI?
- Do Americans consider AI in courtrooms, hiring, and the military to be moral?
- Overall, do you think that artificial intelligence (AI) should...?
See the results for this poll:
- AI models are often trained on content from many different outside sources. How often do you think AI companies have permission to use this content from its copyright holders?
- Regardless of your own opinion, who do you think is more likely to win in legal battles between AI companies and owners of copyrighted material who are suing them for improper use of their content?
- Do you think that AI companies should or should not be allowed to train their models on the open internet, which includes large amounts of copyrighted data?
Methodology: The Daily Questions survey was conducted online on May 2 - 3, 2024 among 12,876 U.S. adults. The sample was weighted according to gender, age, race, education, U.S. census region, and political party. The margin of error for the overall sample is approximately 1%.
Image: Getty (Weiquan Lin)