Authors allege Meta used pirated books to train AI systems With zuckerberg approval.

A group of authors, including Ta-Nehisi Coates and comedian Sarah Silverman, has accused Meta Platforms (META.O) of using pirated versions of copyrighted books to train its AI systems, with the alleged approval of CEO Mark Zuckerberg. These claims were disclosed in court filings made public on Wednesday in a California federal court.

Allegations and New Evidence

The authors, who initially sued Meta in 2023 for copyright infringement, now argue that internal documents obtained during the discovery process show Meta knowingly used pirated works from the LibGen dataset. LibGen, a dataset alleged to contain millions of pirated works, was reportedly distributed through peer-to-peer torrents.

Court documents reveal internal Meta communications indicating Zuckerberg approved the use of the dataset despite concerns raised by Meta’s AI executive team about its legality.

Background of the Case

The lawsuit centers on Meta’s use of copyrighted material to train its large language model, Llama. Meta has previously argued that such usage falls under the “fair use” doctrine, a defense also employed by other tech companies facing similar lawsuits.

In 2023, U.S. District Judge Vince Chhabria dismissed claims that text generated by Meta’s AI chatbots infringed copyrights and that the company unlawfully removed copyright management information (CMI) from authors’ works.

Renewed Legal Push

The authors are now seeking to amend their complaint based on the newly surfaced evidence. They aim to revive their CMI claim, bolster their copyright infringement claims, and introduce a new claim for computer fraud.

Judge Chhabria, while allowing the filing of an amended complaint, expressed skepticism about the viability of the fraud and CMI claims.

Meta’s Response

Meta has not yet commented on the latest filings.

Broader Implications

This case is part of a larger wave of lawsuits accusing tech companies of misusing copyrighted works to develop AI products without consent. The outcome could have far-reaching implications for the AI industry, particularly regarding how training datasets are sourced and used.

As the legal battle unfolds, it highlights the tension between innovation in AI and the protection of intellectual property rights.