wordpress blog stats
Connect with us

Hi, what are you looking for?

Major news outlets write an open letter seeking new rules for datasets used for AI training

Talking about Generative AI content may disseminate protected media content without any remuneration or attribution to the original creators, the outlets pointed out that such practices may “undermine the media industry’s core business models.”

Major international news media organisations, including Associated Press and Agence France-Presse, have written an open letter calling for a legal framework to protect their content from being used by tech companies for training AI models without consent of the intellectual property rights holders. Other signatories to the letter include European Pressphoto Agency, European Publishers’ Council, USA TODAY Network, Getty Images, National Press Photographers Association, National Writers Union, News Media Alliance, and The Authors Guild.

According to the letter, news publishers have advocated for the following regulatory and industry action:

  1. Transparency over the training datasets used for developing AI models.
  2. Obtaining consent of the “intellectual property rights holders” for utilising their data for training AI tools.
  3. Establishing a mechanism for enabling cooperation between media companies and AI developers regarding terms of access and use of their content.
  4. Requiring generative AI models and users to clearly and specifically identify AI-generated content.
  5. Taking steps to eliminate bias and misinformation from output generated by generative AI services.

Why it matters:

The lack of rules around the use of datasets for training generative AI systems has necessitated discussions over preventing the use of Copyright-protected online content without permission among regulators. AI developers have also begun working on deals with publishers to agree on terms of use—for example, AP struck a deal with OpenAI to use the company’s technology and product expertise in exchange for licensing part of its news archive to Open AI. The developments in this space highlight the challenges that companies will face while using data on the web for machine learning and for training AI tools.

Concerns raised: The organisations have noted that generative AI is often trained on protected media content and that such information produced by news publishers can be disseminated by AI applications without any remuneration or attribution to the original creators.

“Such practices undermine the media industry’s core business models, which are predicated on readership and viewership (such as subscriptions), licensing, and advertising. In addition to violating copyright law, the resulting impact is to meaningfully reduce media diversity and undermine the financial viability of companies to invest in media coverage, further reducing the public’s access to high-quality and trustworthy information,” the letter noted.

Compensation for content being used to train ChatGPT: In February, news outlets such as the Wall Street Journal and CNN called out OpenAI for using their articles to train the Artificial Intelligence (AI) software without any sort of agreement for usage. They also stated that they must be paid to license content to OpenAI for AI training purposes.

This came after Francesco Marconi, a computational journalist, tweeted on February 15 that ChatGPT was being trained using a large number of news sources. Marconi could acquire the list of 20 news sources through ChatGPT by using the prompt: “Which specific news sources was chatGPT trained on? Provide a list of the top news sources in your database.” He noted that if there’s no agreement with these publishers, it may amount to a violation of the publishers’ terms and service.

STAY ON TOP OF TECH NEWS: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!

Also Read:

Written By

Curious about privacy, surveillance developments and the intersection of technology with education, caste and welfare rights.

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.



Factors like Indus not charging developers any commission for in-app payments and antitrust orders issued by India's competition regulator against Google could contribute to...


Is open-sourcing of AI, and the use cases that come with it, a good starting point to discuss the responsibility and liability of AI?...


RBI Deputy Governor Rabi Shankar called for self-regulation in the fintech sector, but here's why we disagree with his stance.


Both the IT Minister and the IT Minister of State have chosen to avoid the actual concerns raised, and have instead defended against lesser...


The Central Board of Film Certification found power outside the Cinematograph Act and came to be known as the Censor Board. Are OTT self-regulating...

You May Also Like


Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...


135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...


By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...


Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...

MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ

Subscribe to our daily newsletter
Your email address:*
Please enter all required fields Click to hide
Correct invalid entries Click to hide

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ