Authors sue Nvidia for training its AI platform NeMo with copyrighted materials

The writers claimed that their works were used in a dataset containing 196,640 other books used to train NeMo.

Published

A group of authors has sued Nvidia, a major player in the chipmaking industry, for copyright infringement. According to a report by Reuters, three authors have alleged that Nvidia used their copyrighted written works without permission in a dataset used to train Nvidia’s artificial intelligence (AI) platform NeMo. Brain Keene, Abdi Nazemian and Stewart O’Nan filed a proposed class-action lawsuit on 8 March in San Francisco Federal Court, seeking unspecified damages for people in the United States whose copyrighted works were included in the dataset used to train the AI.

The writers claimed that their works were used in a dataset containing 196,640 other books used to train NeMo, before it was taken down in October 2023 “due to reported copyright infringement”. According to the lawsuit, the takedown represents Nvidia “admitting” to having trained NeMo on the data, thereby infringing their copyright, the Reuters report said.

This is not the first time AI has been accused of copyright infringement by authors.

  • In December 2023, the New York Times (NYT) filed a lawsuit against Open AI, claiming that “millions of articles published by the Times” were used to train OpenAI’s Large Language Models (LLM).
  • US writer Julian Sancton filed a copyright lawsuit against OpenAI and Microsoft in November 2023, on behalf of himself and other non-fiction writers, claiming that the two companies misused his work.
  • In September 2023, the Author’s Guild and a group of 17 prominent authors including George R.R. Martin and David Baldacci filed a copyright infringement lawsuit against Open AI. The guild referred to the unlicensed use of books to train LLMs as an “existential threat to the author profession.”
  • Authors Michael Chabon, David Henry Hwang, Rachel Louise Snyder, and Ayelet Waldman alleged that OpenAI trained ChatGPT using their works without permission and that the AI’s outputs are “derivative” works that infringe on their copyright, filing a lawsuit in September 2023.
  • In July last year, author Sarah Silverman and others filed a lawsuit against OpenAI and Meta, alleging that the companies illegally used copyrighted written works to train their LLMs.
  • Two Massachusetts-based writers, Paul Trembley and Mona Awad, filed a proposed class-action lawsuit, claiming that ChatGPT mined data copied from thousands of copyrighted books without permission, infringing upon their copyright.

