Don’t Let ChatGPT Write So Much Fan Fiction
Authors can put up with a little mild copyright infringement from their superfans, but not acres of text generated by bots.
The most interesting part of the Hollywood writers' new contract is the AI clause. According to news reports, it explicitly allows the studios to “train” large language models on scripts written by members of the Writers Guild. The agreement implies that permission to train an LLM on a writer's works has value. It follows that the right must be purchased from the writer — not simply taken.
Which brings us to the copyright infringement lawsuit filed last week by leading members of the Authors Guild against OpenAI, the company behind GPT-3.4 and GPT4 (with more on the way). Although the complaint lists several violations, the major ones come down to two: First, that OpenAI has violated authors' copyrights by training its programs on scanned copies of published works; and, second, that by enabling users to create what amounts to fan fiction on steroids, OpenAI has contributed to violations by others.
As a writer myself, and an acquaintance of some of the plaintiffs, I sympathize. But as a longtime intellectual property teacher, the claim related to training likely has only a slim chance of succeeding.
The position of the Author's Guild is that the materials on which an AI is trained have value. I agree; so, apparently, does Hollywood. An AI can't generate domestic thrillers if it's never read any domestic thrillers.
The courts, however, will likely hold that the outcome is controlled by the 2015 decision of the US Court of Appeals for the Second Circuit, holding that Google was protected by the “fair use” doctrine when it scanned copyrighted works into its database, in large part because only snippets would be generated to users who tried to search the texts. I'm not sure the court was right, but it'll be a tough precedent to get around.
Which brings us to the plaintiff's second claim.
Here I think the plaintiffs have a case. When ChatGPT can produce a detailed outline for a Game of Thrones prequel with the attractive title Dawn of the Direwolves, using George R.R. Martin's characters and settings — well, if that's not an infringing work, nothing is.
The developers will argue that they've only created a tool, that they're not responsible if fans misuse it. And there of course lies the difficulty for authors. Nobody wants to sue the readers. Most popular novelists tolerate fan fiction because it keeps the target audience excited and expectant as the writer struggles to produce the next book. Fan fiction is not, for the most part, competition; its existence proves the popularity of the author.
All of this assumes, however, that the fan fiction is produced by fans — human beings who are not simply excited and energetic but are also working at what we might call a human pace. For evidence that we're exiting that world, look no further than Amazon's recent announcement that “authors” of self-published books will be limited to posting no more than three per day in the Kindle store. Why? Because of “an influx of suspected AI-generated material” — that is, because the books are being written at a nonhuman pace.
The risk, then, is the creation of a constant stream of derivative works. It's one thing for an author to know that excited human beings will now and then create new stories for their characters. It's something else to know that such stories emerge on demand without practical limit. That's a genuine danger to the incentive to become an author — the precise incentive copyright law exists to protect.
This past August, a federal court sustained the position of the Copyright Office that works created entirely by generative AI are not entitled to copyright protection. “Human authorship is a bedrock requirement of copyright,” wrote Judge Beryl Howell, a rule derived from “centuries of settled understanding.” And there's a practical reason too: “Non-human actors need no incentivization with the promise of exclusive rights under United States law, and copyright was therefore not designed to reach them.”
The non-human actor, then, isn't an author, and doesn't act from incentive, still less from excitement about the underlying work; whatever the virtues of a generative AI, it can hardly be described as a fan. And it wouldn't require a huge tweak for the algorithms to respond to certain queries with, “I'm sorry, but I'm not permitted to create fictional works that are derivative of copyrighted works.”
In a 2019 filing with the US Patent and Trademark Office, OpenAI argued that training LLMs on copyrighted works was fair use, in particular because any other rule would set back research into artificial intelligence. The principal justification is clear in the conclusion: “We hope that US policymakers will continue to allow this area of dramatic recent innovation to proceed without undue burdens from the copyright system.”
What the Authors Guild is trying to do is turn this around a little, to remind developers and users alike that we'd also be wise to allow good old-fashioned human authorship to proceed without undue burdens from AI.
Stephen L. Carter is a Bloomberg Opinion columnist, a professor of law at Yale University and author of “Invisible: The Story of the Black Woman Lawyer Who Took Down America's Most Powerful Mobster.”
One more thing! HT Tech is now on WhatsApp Channels! Follow us by clicking the link so you never miss any updates from the world of technology. Click here to join now!