Microsoft says that large language models (LLMs) are the essence of what copyright law considers transformative and fair use, in its motion to dismiss parts of the New York Times lawsuit against it. The New York Times filed this lawsuit against Microsoft and OpenAI in December last year, arguing that the two companies trained their artificial intelligence (AI) models by infringing on millions of its copyrighted articles. OpenAI has since refuted this argument stating that the publication manipulated prompts to produce regurgitations of its articles.
“Nowhere does The Times allege that anyone other than its legal team would actually do any of this, and certainly not on a scale that merits the doomsday futurology it pushes,” Microsoft said, comparing the copyright concern raised by the publication about AI to the concern raised by the American Motion Pictures Association about the videocassette recorder (VCR).
Microsoft seeks to dismiss the following claims:
That Microsoft is liable for end-user copyright infringement:
Despite holding Microsoft liable for end-user infringement, the New York Times has not alleged that any such infringement took place. Microsoft says that while the publication insinuates that GPT models recite large portions of its content with minimal prompting, it buries this prompting by admitting that their prompts each comprise a short snippet from the beginning of an article from The New York Times, often verbatim snippets of paragraphs. It says that the New York Times’ complaint does not explain why any person in possession of their articles would want to feed part of it into ChatGPT as a prompt.
The publication also fails to claim that Microsoft has contributed to end-user copyright infringement. A contributory infringer is “one who, with knowledge of the infringing activity, induces, causes or materially contributes to the infringing conduct of another.” The complaint does not allege that Microsoft encouraged users to use its product for infringement or promoted/ induced infringement. Further, to make such a claim the New York Times has to establish actual knowledge of specific acts of infringement. Since no such infringement has been alleged, Microsoft has no knowledge of the fact that end users are carrying out infringing activities using its AI models. As such, it argues that it isn’t liable for contributing to such infringement.
That Microsoft violated copyright management information provisions under the Digital Millennium Copyright Act:
Copyright management information (CMI) is information about a copyrighted work, its creator, its owner, or use of the work that is conveyed in connection with a copyrighted work (like its title or the copyright owner’s name). To make a claim about CMI violation, a person has to allege that the CMI was “intentionally removed or altered” from a copy of work. Further, it must be established that the offending party removed CMI “knowing, or having reasonable grounds to know, that it will induce, enable, facilitate, or conceal an infringement” of copyright. Microsoft says that over the past 18 months, many have failed to adequately make this claim against generative AI tools. It breaks down the CMI violation into two parts—
- Alleged infringement in training data: Microsoft says that the point of CMI is to inform the public that something is copyrighted. Training takes place out of public view, and training data itself is not disseminated. The company argues that even the New York Times’ complaint acknowledges that Microsoft believes its conduct of training AI language models on copyrighted content is protected under the “fair use” doctrine of copyright law. The New York Times has not alleged that Microsoft’s belief in its fair use rights is unreasonable. The allegation of CMI removal from training data is also not substantiated, with no actual examples of CMI removal being mentioned in The New York Times’ complaint.
- Alleged infringed output: To this, Microsoft argues that to generate The New York Times’ content would require the user to already know its origins. Further, CMI removal claims lie only with those works that are “substantially or entirely reproduced.” The New York Times claims that the search results generated by the GPT-based Bing tools vaguely refer to “copies” and “derivatives of Times Works,” without explaining what precisely was copied. Elsewhere in its complaint New York Times has also admitted that that these results are answers to user queries and may include extensive paraphrasing and direct quotes. Microsoft says that the mere non-inclusion of CMI with “paraphrases” or “quotes” does not constitute the removal of CMI from a substantially identical copy of a work.
That the GPT models compete with the New York Times by creating the same or similar content:
The New York Times claims that Microsoft’s AI models threaten high-quality journalism and hinder the publication’s ability to grow its subscriber base. It argues that the models threaten the reputation of its product review website Wirecutter by falsely attributing a review a recommendation to the website. On Wirecutter’s website, the recommendations contain hyperlinks to purchase the recommended products, which can lead to commissions for Wirecutter. Since GPT tools do not include these links, The New York Times argues that this would be “misappropriation of commercial referrals”. This, Microsoft claims, is mere speculation and says there isn’t a single real-world example of such an occurrence.
Also read:
- OpenAI Responds To The New York Times Copyright Lawsuit Calling It Meritless
- OpenAI Sued For Copyright Infringement The Intercept, Raw Story And AlterNet
- Elon Musk Sues OpenAI For Violating Founding Agreement Of Developing Artificial General Intelligence For Public Benefit
STAY ON TOP OF TECH NEWS: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!