By Gary Symons
TLL Editor in Chief
(Sarah Silverman photo by Gage Skidmore)
Comedian Sarah Silverman says copyright infringement by artificial intelligence bots is no laughing matter, as she sues Open AI and Meta.
Silverman is one of three people filing the lawsuits charging that AI bots were trained on their work without their permission, resulting in AI replicating their work. The other two plaintiffs are authors Christopher Golden and Richard Kadrey.
The basis for their claim is similar to those made recently by multiple artists. Tools like the popular ChatGPT are trained on large language models that are fed huge amounts of data taken from the internet in order to train them to give convincing responses to text prompts from users. In the case of ChatGPT, that would include books, articles, essays, and other types of text documents that were originally produced by human beings.
TLL does not use AI bots to produce articles, but when The Licensing Letter was originally testing ChatGPT’s capabilities, we asked the AI bot to write a short press release in the writing style of children’s writer Dr. Seuss, which it did in under 30 seconds. That would not be possible unless the AI bot was previously trained on written material from the writer.
The lawsuit against OpenAI claims the authors “did not consent to the use of their copyrighted books as training material for ChatGPT. Nonetheless, their copyrighted materials were ingested and used to train ChatGPT.”
Similarly, the lawsuit against Meta makes the claim that the authors’ books appear in the vast dataset that was used to the train Meta’s LLaMA, which is a group of AI models.
Both suits also claim that their works were obtained without their permission from what they call ‘shadow library’ that have been widely used by the AI research community to develop the language skills of their AI bots.
So, how did the plaintiffs discover their works had been used? Ironically, by using the AI bots themselves.
The OpenAI suit includes exhibits of evidence that claim the bots were prompted to produce summaries of three books, which it successfully accomplished, those being Silverman’s The Bedwetter, Ararat by Golden, and Kadrey’s Sandman Slim. The Meta suit likewise cites multiple works by the authors being used by their bots, as well as a research paper from Meta AI called “Open and Efficient Foundation Language Models” that showed the system’s training data sets included material taken from the shadow libraries in a manner that the claimants described as “flagrantly illegal.”
The lawyers representing the three plaintiffs are Joseph Saveri and Matthew Butterick, who have said they have been hearing from several writers and publishers that they’re concerned about the ability of AI bots to generate text that is extremely similar to their copyrighted works.
At the same time, as previously reported in TLL, the stock photo company Getty Images is suing the generative image AI Stable Diffusion over copyright infringement, alleging the bot is trained using images drawn from its massive image library. Saveri and Butterick themselves are also suing the companies behind three image generating bots on behalf of a trio of artists, those being Sarah Andersen, Kelly McKernan, and Karla Ortiz.
Neither OpenAI nor Meta have yet commented on the allegations or the lawsuits.
Dig Deeper With the Following AI-Related Features
Opinion: Artists Are Freaking Out Over AI Apps, And So Should You
Analysis: US Government Defines Copyright Protection for AI-Created Works
Licensing Law: “This is the Final Straw, AI,” Furious Drake Says