The second most famous AI startup after Open AI decided to cut one of the main Gordian knots of our time. The company believes that the data on which neural networks are trained should not be protected by copyright. Model developers do not have to pay to use them. Otherwise, products based on artificial intelligence simply will not be able to exist.
Representatives of Anthropic made this statement in court in a lawsuit filed against them in October 2023 by a group of music publishers, including UNIVERSAL MUSIC CORP. Copyright holders want compensation for damages for the use of lyrics from their songs in educational compilations. The plaintiffs claim that Anthropic illegally copied and distributed “myriads” of lyrics and are seeking damages of “up to $150,000 per composition.” The plaintiffs do not understand why one company should freely and freely use another company's product to build its business.
Anthropic responded that in training AI models, copyrighted works are used to analyze statistical relationships between words and concepts, which has nothing to do with the purpose of creating the work. Moreover, this does not in any way prevent the copyright holder from continuing to make money from his intellectual property.
Requiring you to pay to use copyrighted content will result in only rich companies being able to use it.
In a document dated January 16, 2024, Anthropic outlined its arguments in more detail. And they started with the fact that they are not the only company that takes data from open sources on the Internet.
“From a practical point of view, there is no other way to obtain sufficient training data of the scale and variety needed to train a complex LLM model that will understand human language and the world at large.”
Another argument: too much data is needed to build some kind of effective payment mechanism for the use of intellectual property:
“It is impossible to enter into licensing agreements with enough rights holders to cover the billions of texts needed to generate trillions of tokens for proper general purpose LLM training. If training required licensing of copyrighted content, current generative AI tools simply could not exist.”
Anthropic also tried to console copyright holders by integrating mechanisms for limiting the use of copyrighted content into their models. The model must be able to recognize such content and not reproduce it beyond certain limits.
True, these mechanisms do not work imperfectly, the company admitted. Users now and then manage to bypass them and force the model to give them a complete fragment of copyrighted content.
Anthropic believes that the use of copyrighted content for teaching LLM models should be considered fair use (the doctrine of fair use implies limitations and exceptions to copyright under certain conditions, in particular, if the use benefits society or promotes the development of science or the arts). In American legislation (where this doctrine appeared and operates, which raises questions about solving the same problem in other jurisdictions) there is an algorithm for determining whether use falls under the criteria required for fair use.
Meanwhile, in the United States, the New York Times rejected half-measures and filed a lawsuit against OpenAI, demanding that the court liquidate ChatGPT. The NYT claims that the neural network was illegally trained on data belonging to the publication, and, in addition, it now produces answers containing this data.
OpenAI should, the plaintiff believes, remove the illegally used data, as well as the GPT-4 language model itself. Under US copyright law, a court can indeed order the removal of infringing products, as well as the equipment used to create them.
A relatively similar precedent already exists: The Federal Trade Commission has already forced several companies, including the weight loss service Weight Watchers, to delete illegally collected data and algorithms that operate on the basis of this data. On the other hand, the legislation allows the destruction of illegal products only if this is the “sole means” to avoid the continuation of the offense.
Useful from Online Patent:
→ What is the Register of Domestic Software?
→ Free online search in the databases of Rospatent and the Madrid System (available after registration).
→ Can a foreign company enter its program into the Register of Domestic Software?
→ How IT companies can maintain zero VAT and get included in the Register of Domestic Software
→ How to patent a technology?