UNIVERSITY of NOTRE DAME
Licensed to Learn: Mitigating Copyright Infringement Liability of Generative AI Systems Through Contracts
Frank Morton-Park
Introduction
When given prompts by users, generative artificial intelligence (AI) systems are capable of creating works of art, music, and literature. These systems are increasingly popular with users as well as with investors, who have invested billions of dollars. While these billiondollar investments surely contributed to the recent success of generative AI systems, the recent advancements in this technology are also attributed to the massive datasets containing billions of copyrighted works used to train the AI systems. While consumers (and investors) are excited by the potential of generative cultural production, some artists have recently expressed their dismay at the presence of their work in datasets used to train popular AI systems, and are pursuing claims of copyright infringement in a class-action lawsuit against companies creating or employing such AI systems. Even when the creators of generative AI systems could more easily license copyrighted works, say from another company whose business is licensing copyrighted works, these companies forego any up-front transaction costs with respect to copyright owners, preferring instead to simply incorporate copyrighted works without permission from the copyright owners into their systems.
Such extensive, unauthorized use of copyrighted works to train generative AI systems is typically justified by the assumption that these uses of copyrighted works are permitted within fair use. However, there is a strong likelihood that courts may find such uses of copyrighted works by modern generative AI systems not to be fair use because the use of these systems trained on copyrighted works are increasingly commercial and expressive in a way that directly encroaches on the potential market for the original copyrighted works. Further, researchers have demonstrated that these generative systems are capable of creating images that almost exactly replicate an input image used for training, which undermines arguments that the use of copyrighted works for training generative AI systems is transformative.
Another justification for the extensive use of copyrighted works without authorization is based on the notion that AI systems are just as entitled as humans to consume—that is, to read, listen to, and view— copyrighted works that are freely available on the Internet. This assumption is central to techniques for text and data mining (TDM) used to scrape copyrighted works from across the Internet to form massive datasets, as well as for the training of AI systems on such datasets. Closely intertwined with the assumption of fair use is the presumption that contracts such as dataset license agreements, terms of service, and enduser license agreements (EULAs) will absolve parties of copyright infringement liability. As dataset assembly techniques and generative AI systems stretch the boundaries of the fair use doctrine to the point of failure, such contracts alone are unlikely to protect parties—namely dataset assemblers, creators of AI systems, and end users of such systems—from claims of copyright infringement, unless these contracts authorize the use of copyrighted works for generative AI systems and those offering such contracts have copyright rights to grant such authorization.
- Emerging Technology
Article by Bryan Hance
- Emerging Technology
Article by Andrian Lee, Melissa K. Scanlan & Cora L. Sutherland
Notre Dame Journal on Emerging Technologies ©2020