wordpress blog stats
Connect with us

Hi, what are you looking for?

US Copyright Office seeks public views on study into AI and copyright

US Copyright Office is seeking public comments on specific questions to bring clarity to the copyright issue of AI.

The US Copyright Office on August 30 launched a study into artificial intelligence (AI) and copyright issues, seeking public comments on a number of copyright issues raised by recent advances in AI such as:

  1. The use of copyrighted works to train AI models.
  2. The copyrightability of material generated using AI systems.
  3. Potential liability for infringing works generated using AI systems.
  4. The treatment of generative AI outputs that imitate the identity or style of human artists.
  5. The appropriate levels of transparency and disclosure with respect to the use of copyrighted works.

The agency wants to assess whether legislative or regulatory steps are warranted.

You can submit your comments and views on this subject here by October 18, 2023.

Copyright is the biggest concern with AI right now, especially generative AI. Training generative AI models involve large sets of data, some of which are obtained by scraping publicly available information on the internet. But in many cases, this information might be copyrighted. The trained models might then output work that is inspired by this data without providing appropriate compensation or attribution to the source. This has resulted in multiple copyright lawsuits already, but since the issue is novel, even courts are not fully equipped to deal with the issue.

There is also the other concern of whether AI-generated content should be eligible for copyright. This, too, is treading on unfamiliar grounds. The Copyright Office, for instance, was involved in a lawsuit filed by Stephen Thaler who alleged that the Office refused rights to an image created by his AI system. A US court in August sided with the US Copyright Office in this case, noting that the US copyright law protects “only works of human creation.”

The Copyright Office’s inquiry could help bring more clarity to both the issues outlined above.

Article continues below ⬇, you might also want to read:

Full list of questions posed by the US Copyright Office

Quoted verbatim from the inquiry.

General Questions:

  1. As described above, generative AI systems have the ability to produce material that would be copyrightable if it were created by a human author. What are your views on the potential benefits and risks of this technology? How is the use of this technology currently affecting or likely to affect creators, copyright owners, technology developers, researchers, and the public?
  2. Does the increasing use or distribution of AI-generated material raise any unique issues for your sector or industry as compared to other copyright stakeholders?
  3. Please identify any papers or studies that you believe are relevant to this Notice. These may address, for example, the economic effects of generative AI on the creative industries or how different licensing regimes do or could operate to remunerate copyright owners and/or creators for the use of their works in training AI models. The Office requests that commenters provide a hyperlink to the identified papers.
  4. Are there any statutory or regulatory approaches that have been adopted or are under consideration in other countries that relate to copyright and AI that should be considered or avoided in the United States? How important a factor is international consistency in this area across borders?
  5. Is new legislation warranted to address copyright or related issues with generative AI? If so, what should it entail? Specific proposals and legislative text are not necessary, but the Office welcomes any proposals or text for review.


  1. What kinds of copyright-protected training materials are used to train AI models, and how are those materials collected and curated?
    1. How or where do developers of AI models acquire the materials or datasets that their models are trained on? To what extent is training material first collected by third-party entities (such as academic researchers or private companies)?
    2. To what extent are copyrighted works licensed from copyright owners for use as training materials? To your knowledge, what licensing models are currently being offered and used?
    3. To what extent is non-copyrighted material (such as public domain works) used for AI training? Alternatively, to what extent is training material created or commissioned by developers of AI models?
    4. Are some or all training materials retained by developers of AI models after training is complete, and for what purpose(s)? Please describe any relevant storage and retention practices.
  2. To the extent that it informs your views, please briefly describe your personal knowledge of the process by which AI models are trained. The Office is particularly interested in:
    1. How are training materials used and/or reproduced when training an AI model? Please include your understanding of the nature and duration of any reproduction of works that occur during the training process, as well as your views on the extent to which these activities implicate the exclusive rights of copyright owners.
    2. How are inferences gained from the training process stored or represented within an AI model?
    3. Is it possible for an AI model to “unlearn” inferences it gained from training on a particular piece of training material? If so, is it economically feasible? In addition to retraining a model, are there other ways to “unlearn” inferences from training?
    4. Absent access to the underlying dataset, is it possible to identify whether an AI model was trained on a particular piece of training material?
  3. Under what circumstances would the unauthorized use of copyrighted works to train AI models constitute fair use? Please discuss any case law you believe relevant to this question.
    1. In light of the Supreme Court’s recent decisions in Google v. Oracle America and Andy Warhol Foundation v. Goldsmith, how should the “purpose and character” of the use of copyrighted works to train an AI model be evaluated? What is the relevant use to be analyzed? Do different stages of training, such as pre-training and fine-tuning, raise different considerations under the first fair use factor?
    2. How should the analysis apply to entities that collect and distribute copyrighted material for training but may not themselves engage in the training?
    3. The use of copyrighted materials in a training dataset or to train generative AI models may be done for noncommercial or research purposes. How should the fair use analysis apply if AI models or datasets are later adapted for use of a commercial nature? Does it make a difference if funding for these noncommercial or research uses is provided by for-profit developers of AI systems?
    4. What quantity of training materials do developers of generative AI models use for training? Does the volume of material used to train an AI model affect the fair use analysis? If so, how?
    5. Under the fourth factor of the fair use analysis, how should the effect on the potential market for or value of a copyrighted work used to train an AI model be measured? Should the inquiry be whether the outputs of the AI system incorporating the model compete with a particular copyrighted work, the body of works of the same author, or the market for that general class of works?
  4. Should copyright owners have to affirmatively consent (opt in) to the use of their works for training materials, or should they be provided with the means to object (opt out)?
    1. Should consent of the copyright owner be required for all uses of copyrighted works to train AI models or only commercial uses?
    2. If an “opt out” approach were adopted, how would that process work for a copyright owner who objected to the use of their works for training? Are there technical tools that might facilitate this process, such as a technical flag or metadata indicating that an automated service should not collect and store a work for AI training uses?
    3. What legal, technical, or practical obstacles are there to establishing or using such a process? Given the volume of works used in training, is it feasible to get consent in advance from copyright owners?
    4. If an objection is not honored, what remedies should be available? Are existing remedies for infringement appropriate or should there be a separate cause of action?
    5. In cases where the human creator does not own the copyright—for example, because they have assigned it or because the work was made for hire—should they have a right to object to an AI model being trained on their work? If so, how would such a system work?
  5. If copyright owners’ consent is required to train generative AI models, how can or should licenses be obtained?
    1.  Is direct voluntary licensing feasible in some or all creative sectors?
    2. Is a voluntary collective licensing scheme a feasible or desirable approach? Are there existing collective management organizations that are well-suited to provide those licenses, and are there legal or other impediments that would prevent those organizations from performing this role? Should Congress consider statutory or other changes, such as an antitrust exception, to facilitate negotiation of collective licenses?
    3. Should Congress consider establishing a compulsory licensing regime? If so, what should such a regime look like? What activities should the license cover, what works would be subject to the license, and would copyright owners have the ability to opt out? How should royalty rates and terms be set, allocated, reported and distributed?
    4. Is an extended collective licensing scheme a feasible or desirable approach?
    5. Should licensing regimes vary based on the type of work at issue?
  6. What legal, technical or practical issues might there be with respect to obtaining appropriate licenses for training? Who, if anyone, should be responsible for securing them (for example when the curator of a training dataset, the developer who trains an AI model, and the company employing that model in an AI system are different entities and may have different commercial or noncommercial roles)?
  7. Is it possible or feasible to identify the degree to which a particular work contributes to a particular output from a generative AI system? Please explain.
  8. What would be the economic impacts of a licensing requirement on the development and adoption of generative AI systems?
  9. Please describe any other factors you believe are relevant with respect to potential copyright liability for training AI models.

Transparency and Recordkeeping:

  1. In order to allow copyright owners to determine whether their works have been
    used, should developers of AI models be required to collect, retain, and disclose records regarding the materials used to train their models? Should creators of training datasets have a similar obligation?

    1. What level of specificity should be required?
    2. To whom should disclosures be made?
    3. What obligations, if any, should be placed on developers of AI systems that incorporate models from third parties?
    4. What would be the cost or other impact of such a recordkeeping system for developers of AI models or systems, creators, consumers, or other relevant parties?
  2. What obligations, if any, should there be to notify copyright owners that their works have been used to train an AI model?
  3. Outside of copyright law, are there existing U.S. laws that could require developers of AI models or systems to retain or disclose records about the materials they used for training?

Copyrightability of Generative AI Outputs

  1. Under copyright law, are there circumstances when a human using a generative AI system should be considered the “author” of material produced by the system? If so, what factors are relevant to that determination? For example, is selecting what material an AI model is trained on and/or providing an iterative series of text commands or prompts sufficient to claim authorship of the resulting output?
  2. Are any revisions to the Copyright Act necessary to clarify the human authorship requirement or to provide additional standards to determine when content including AI-generated material is subject to copyright protection?
  3. Is legal protection for AI-generated material desirable as a policy matter? Is legal protection for AI-generated material necessary to encourage development of generative AI technologies and systems? Does existing copyright protection for computer code that operates a generative AI system provide sufficient incentives?
    1. If you believe protection is desirable, should it be a form of copyright or a separate sui generis right? If the latter, in what respects should protection for AI-generated material differ from copyright?
  4. Does the Copyright Clause in the U.S. Constitution permit copyright protection for AI-generated material? Would such protection “promote the progress of science and useful arts”? If so, how?

Infringement by works generated using AI systems

  1.  Can AI-generated outputs implicate the exclusive rights of preexisting copyrighted works, such as the right of reproduction or the derivative work right? If so, in what circumstances?
  2. Is the substantial similarity test adequate to address claims of infringement based on outputs from a generative AI system, or is some other standard appropriate or necessary?
  3. How can copyright owners prove the element of copying (such as by demonstrating access to a copyrighted work) if the developer of the AI model does not maintain or make available records of what training material it used? Are existing civil discovery rules sufficient to address this situation?
  4. If AI-generated material is found to infringe a copyrighted work, who should be directly or secondarily liable—the developer of a generative AI model, the developer of the system incorporating that model, end users of the system, or other parties?
    1. Do “open-source” AI models raise unique considerations with respect to infringement based on their outputs?
  5. If a generative AI system is trained on copyrighted works containing copyright management information, how does 17 U.S.C. 1202(b) apply to the treatment of that information in outputs of the system?
  6. Please describe any other issues that you believe policymakers should consider with respect to potential copyright liability based on AI-generated output.

Labeling or Identification

  1. Should the law require AI-generated material to be labeled or otherwise publicly
    identified as being generated by AI? If so, in what context should the requirement apply and how should it work?

    1. Who should be responsible for identifying a work as AI-generated?
    2. Are there technical or practical barriers to labeling or identification requirements?
    3. If a notification or labeling requirement is adopted, what should be the consequences of the failure to label a particular work or the removal of a label?
  2. What tools exist or are in development to identify AI-generated material, including by standard-setting bodies? How accurate are these tools? What are their limitations?

Additional Questions About Issues Related to Copyright

  1. What legal rights, if any, currently apply to AI-generated material that features the name or likeness, including vocal likeness, of a particular person?
  2. Should Congress establish a new federal right, similar to state law rights of publicity, that would apply to AI-generated material? If so, should it preempt state laws or set a ceiling or floor for state law protections? What should be the contours of such a right?
  3. Are there or should there be protections against an AI system generating outputs that imitate the artistic style of a human creator (such as an AI system producing visual works “in the style of” a specific artist)? Who should be eligible for such protection? What form should it take?
  4. With respect to sound recordings, how does section 114(b) of the Copyright Act relate to state law, such as state right of publicity laws? Does this issue require legislative attention in the context of generative AI?
  5. Please identify any issues not mentioned above that the Copyright Office should consider in conducting this study.

STAY ON TOP OF TECH POLICY: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!

Written By

Free Reads


Any licensed service provider will be eligible for testing in the regulatory sandbox as principal applicants, provided they meet the conditions laid down for...


The FIR has been filed with the Cyber Crime Cell of the Mumbai Police against an undisclosed person under sections of the Indian Penal...


Paytm streamlines UPI services, transitioning users from Paytm Payments Bank to four major PSP banks after NPCI green light.

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.



NPCI CEO Dilip Asbe recently said that what is not written in regulations is a no-go for fintech entities. But following this advice could...


Notably, Indus Appstore will allow app developers to use third-party billing systems for in-app billing without having to pay any commission to Indus, a...


The existing commission-based model, which companies like Uber and Ola have used for a long time and still stick to, has received criticism from...


Factors like Indus not charging developers any commission for in-app payments and antitrust orders issued by India's competition regulator against Google could contribute to...


Is open-sourcing of AI, and the use cases that come with it, a good starting point to discuss the responsibility and liability of AI?...

You May Also Like


Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...


135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...


By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...


Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...

MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ

Subscribe to our daily newsletter
Your email address:*
Please enter all required fields Click to hide
Correct invalid entries Click to hide

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ