UPDATED 12:30 EDT / DECEMBER 27 2023

AI

New York Times sues Microsoft, OpenAI over AI training copyright infringement

The New York Times sued Microsoft Corp. and ChatGPT developer OpenAI today alleging that the two companies allegedly copied and used “millions” of its articles to train their artificial intelligence models without permission.

The Times filed a lawsuit today in a New York federal court alleging that the two companies committed wide-scale copyright infringement. The complaint pointed out that although the training of generative AI models often scrapes the internet from multiple sources of content, OpenAI and Microsoft “gave Times content particular emphasis” when building their AI models.

Large language models, such as those that underpin OpenAI’s ChatGPT and Microsoft’s Copilot AI chatbots require vast amounts of human-generated content to train them to allow them to generate their humanlike conversational capabilities. In turn, this allows them to more accurately summarize and provide knowledge insights from ingested content. That’s why LLM providers accrue data from a broad variety of sources.

According to the Times complaint, after the newspaper outlet discovered that its content had been ingested by the company’s LLMs, it attempted to come to an agreement regarding its use.

“For months, The Times has attempted to reach a negotiated agreement with Defendants, in accordance with its history of working productively with large technology platforms to permit the use of its content in new digital products,” the complaint states. “These negotiations have not led to a resolution.”

The Times said that OpenAI and Microsoft claimed “fair use,” a legal doctrine that allows the use of copyrighted works when they are so fundamentally transformed by their use that the original content has been changed significantly. The Times argued that the outputs from ChatGPT and Copilot were so similar to the original content that it couldn’t be considered fair use.

As part of the complaint, the Times produced examples of verbatim segments from its articles, which the outlet said was significantly longer than would be displayed in a traditional search engine. The lawsuit also noted that unlike search results, it does not include citations, or hyperlinks, that would lead users to the Times website, thereby reducing traffic.

The Times also alleged the AI models in question would sometimes “hallucinate,” or provide false statements, and attribute them incorrectly to Times sources. According to the complaint, these inaccurate statements, especially mixed with verbatim content, could damage the reputation of the news outlet.

OpenAI has entered into multiple deals with online content producers this year to license content, including the Associated Press and Axel Springer SE, the parent company of Politico. The deal with Axel Springer would not provide full articles to AI models but instead summaries of stories.

In August, the Times, CNN, Reuters and other news organizations blocked OpenAI’s web crawler, which scrapes publicly available but copyright-protected information from the internet from reading their websites.

This lawsuit adds to a growing list of court battles that OpenAI and Microsoft are embroiled in involving copyright infringement. Last month, a group of nonfiction authors sued the two companies, alleging that their books had been used to train the AI models without their permission. Earlier, fiction authors John Grisham and George R.R. Martin were among 17 authors suing OpenAI for “systematic theft on a mass scale,” citing similar copyright allegations.

In its complaint, the Times did not state any specific amount for damages but said it would be seeking attorney’s fees and restitution, which the lawyers said could amount to “billions of dollars” in statutory and actual damages. The lawsuit also calls for the destruction of all AI models and training sets that incorporate Times content.

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU