ANI v OpenAI: Not Everything an LLM Does is Copyright Infringement

With judgment now reserved in ANI v OpenAI, India stands at the cusp of what might be its first major judicial reckoning with the copyright implications of generative AI. The case raises foundational questions on whether AI systems merely process information in new ways or unlawfully appropriate protected expression. Vishno Sudheendra examines two of the most contested issues from the final hearings: chatbot web search functionality and memorization. Vishno is a fourth-year B.A., LL.B (Hons) student at the National Law School of India University, Bangalore, with a keen interest in various aspects of IPR and technology law.

The judgment in ANI v OpenAI [CS(COMM) 1028/2024] has been reserved, with the last hearing culminating on 27th March, 2026 [order]. This litigation is the first (and, so far, the only) Copyright and AI training-related litigation in India. The DPIIT Working Paper had also left the question of whether AI training infringes copyright open, noting that it “does not attempt to resolve these questions or offer definitive conclusions” on this issue. Thus, the verdict in this case will shape the legal course of copyright and AI training-related issues in India. [Readers can view previous updates and analysis by Bharathwaj here]

In this post, I seek to discuss two interesting issues raised in the last two hearings- web search functionality of AI chatbots and memorization. The search functionality of AI chatbots allows them to access real-time information from the web using Retrieval Augmented Generation (“RAG”) – essentially, this complements their training data, which is updated periodically. ANI argued that this functionality is infringing and also drives down web traffic to their website, reducing readership. The second issue is that of memorization – this is a phenomenon where LLMs regurgitate their training data verbatim. GEMA v OpenAI (2025), a ruling of a Munich regional court, was also cited, where OpenAI was held liable for ChatGPT reproducing the lyrics of a song.

I argue that the search functionality of AI chatbots is still essentially non-expressive use, no different from the training process, and the regurgitation caused due memorization happens in most cases due to adversarial prompting, thus the user may be held liable in case they publish it (if not private/personal use exception might apply), and the AI developers are not liable, given that they have employed guardrails which are being circumvented by users. I further argue that secondary liability must also not be attributed to AI developers since these chatbots have substantial non-infringing uses and were not designed with an intent to infringe (barring situations where there is a lack of adequate guardrails, absent adversarial prompts).

The Web Search Functionality & RAG:

The training data of LLMs are updated periodically, which creates an information/knowledge gap in the interval periods. This problem is optimized by extracting information from real-time web searches (indexed in its knowledge base) using Retrieval-Augmented Generation (“RAG”). It enables referencing “an authoritative knowledge base outside of its training data sources before generating a response”. This helps optimize common LLM problems like generating false information when it lacks a coherent answer, generalised/vague outputs, using non-authoritative sources etc (read more here and here). Here is an example (where you can see ChatGPT referencing various websites) :

ChatGPT has a web search functionality, and it also provides the website links (URLs) of the sites it has accessed in the process of generating its response (as shown above). ANI, in its submissions against OpenAI on 20th March 2026, argued that this drives down its web traffic and causes loss of readership/revenue. OpenAI restricted the applicability of this functionality on ANI websites during the course of this litigation, when ChatGPT is prompted to provide updates only using ANI as a source, it is unable to and provides the following disclaimer:

But before examining ANI’s claims that it drives down web traffic, readership and revenue, let’s take a step back and ask if it is a copyright issue? The way information extracted via RAG is processed, is no different from the regular training process as regards its non-expressive character. The first step of RAG which goes beyond training data and accessing an external knowledge base, involves the following:

“RAG systems use a process called embedding to transform data into numerical representations called vectors. The embedding model vectorizes the data in a multidimensional mathematical space, arranging the data points by similarity. Data points judged to be closer in relevance to each other are placed closely together.” [IBM]

Such a process is nothing but a non-expressive use, where the protectable elements of a copyrighted work are not copied/reproduced, but only unprotectable elements, like the statistical relationship between words, are noted. [to read more on non-expressive use and how general AI training is non-infringing, see Shivam Kaushik (Part 1 and Part 2) and Akshat Agrawal, and this illustrative article]

Therefore, given the non-expressive use embodied in RAG and web search functionality, where unprotectable elements of a copyrighted work are used, it is not copyright infringement leading to alleged loss of revenue/readership/web traffic but legitimate competition! Copyright is not a right that guarantees readership, revenue, web traffic, or monopoly against legitimate competition [there is no copyright beyond what is provided in the Copyright Act – Section 16]. Thus, the moot question is whether any of the Section 14 rights have been infringed, and there exist extremely compelling arguments against any such infringement taking place [See, Shivam Kaushik (Part 1 and Part 2) and Akshat Agrawal]

Another interesting argument in the specific context of this case was advanced by Amicus Prof. Arul Scaria, who noted that the merger doctrine may apply here, since news or facts can often be expressed in only a limited number of ways (the merger doctrine holds that when an idea can be expressed in very few ways, the expression is deemed to have “merged” with the idea and is therefore not protectable). Prof. Edward Lee, in the context of 4th fair use factor – market harm – notes that “As long as the summary or output of an AI generator does not copy original expression from online news sources, but copies merely facts, the dissemination of such facts does not produce a cognizable harm under Factor 4 of fair use in “the protected aspect” of the underlying work” [Edward Lee]

Memorization:

Memorization in AI refers to a model’s tendency to store and reproduce specific pieces of information from its training data, rather than learning generalizable patterns. Lots of studies have shown how memorization occurs [see Carlini et al., Nasr et al., Cooper et al.]. In most instances of memorization leading to verbatim regurgitation of copyrighted works does not occur automatically, extractive prompts designed to circumvent guardrails employed by these chatbots are used to attain such verbatim reproduction (known as adversarial prompts). In rare cases, however, simple prompts like ‘provide the lyrics of xyz song’ may also lead to verbatim reproduction, as seen in the case of GEMA v OpenAI.

This problem is compounded as, in copyright law, apart from verbatim reproduction of copyrighted material, even substantial reproduction of the same is also considered infringing [R.G. Anand v. M/S. Delux Films]

So should AI developers be held liable for such reproduction? My answer for this is qualified, in cases where there exist sufficient guardrails that are being circumvented by users using adversarial prompts, the liability must not lie with the AI developers but with the users (as, apart from AI chatbots having substantial non-infringing uses, akin to intermediaries, AI developers have done their due diligence and it is the users deliberately circumventing it). In cases where chatbots reproduce copyrighted material verbatim [or those bearing substantial similarity to copyrighted works] for simple prompts (without extractive techniques employed by the users) the liability should vest with the AI developer.

Essentially, the nature of liability here would be indirect or contributory liability, as AI developers can argue that there was no volitional conduct on their part and they had employed sufficient guardrails – “Lack of foreseeability or reasonable copyright safety measures effectively breaks the chain of causation required for volitional conduct” [Matthew Sag].

Note: In the context of ANI v OpenAI, Amicus Adv. Adarsh Ramanujan pointed out that ANI does not seem to have made a case for memorization from what has been presented. I agree with Mr. Ramanujan that demonstrating memorization would require isolating the model from external sources, such as by disabling web search functionality, to assess whether the LLM reproduces ANI’s copyrighted works from its training data, either verbatim or in a substantially similar form.

Contributory Infringement for Memorized Infringing Outputs?

It may be argued that AI developers have contributory liability for enabling the generation of infringing outputs (material contribution) and that they have knowledge of their product’s ability to be used for copyright infringement. However, this liability should be qualified: where adequate safeguards are deliberately circumvented through adversarial prompting, responsibility ought to lie with the user; conversely, where infringing outputs are generated in response to simple prompts, liability should vest with the developer.

Matthew Sag argues that AI developers might find recourse in the observation of SCOTUS in Sony Corp. of America v. Universal City Studios, Inc (“Betamax case”). where it was observed that general knowledge about the product’s ability to infringe copyright is not sufficient to constitute contributory liability. The SCOTUS, in the Betamax case, famously observed that “the sale of copying equipment, like the sale of other articles of commerce, does not constitute contributory infringement if the product is widely used for legitimate, unobjectionable purposes. Indeed, it need merely be capable of substantial noninfringing uses.”

Similarly, AI chatbots are widely used for legitimate and unobjectionable purposes, while being capable of substantial noninfringing uses.

In India, Section 51(a)(ii) of the Copyright Act may be used against AI developers which states that copyright is deemed to be infringed if any person without a license “permits for profit any place to be used for the communication of the work to the public where such communication constitutes an infringement of the copyright in the work, unless he was not aware and had no reasonable ground for believing that such communication to the public would be an infringement of copyright”.

The latter part of the provision provides a defence based on lack of knowledge or reasonable grounds of belief. Interpreting this provision, the Division Bench of the Delhi High Court in MySpace v. Super Cassettes clarified that “knowledge” requires actual consciousness or awareness, and not mere possibility or suspicion of something likely. Mere suspicion or apprehension is not enough. “Knowledge is to be therefore placed in pragmatically in the context of someone’s awareness (i.e a human agency); a modification on the technical side by use of software would per se not constitute knowledge” [MySpace Paras 35-37]

Applied to generative AI systems, the MySpace standard suggests that platform-level awareness of theoretical infringement risks does not equate to legal “knowledge” under Section 51(a)(ii). Thus, contributory infringement may be ruled out on this front as well. However, as mentioned earlier, this defence would be available only if the output has been generated using adversarial prompts circumventing guardrails employed by the developers, if AI developers had in the first place employed no/flimsy guardrails, this defence might not be available due to the foreseeability of infringement.

What About the Users?

Users who undertake adversarial prompting to make the chatbots regurgitate verbatim copyrighted outputs may claim the exception under Section 52(a)(i) which allows “fair dealing with any work, not being a computer programme, for the purposes of— (i) private or personal use, including research”. Authors and publishers would counter this by arguing that such use must nevertheless involve “lawful” access. But here is the conceptual problem: if every enumerated exception required lawful access irrespective of statutory wording, what would be the point of specifying exceptions at all?

Further, as Amicus Prof. Scaria pointed out to the Court in a different context, lawful access is not a general requirement under Section 52. He further stated that the Copyright Act expressly includes a “lawful access” requirement only where framers of the statute intended it, for example, in Sections 52(1)(aa), (ab), and (ad). Reading the same requirement into provisions where it is not mentioned violates the statutory-interpretation principle of expressio unius est exclusio alterius (the express mention of one thing implies the exclusion of another).

Interested readers may also see Yogesh’s post on lawful access in the context of the DPIIT Working Paper.

Conclusion:

As we await the reserved judgment in ANI v. OpenAI, the legal stakes extend far beyond the immediate dispute. The case presents the judiciary with its first opportunity to clarify how copyright law should respond to AI training. As this post argues, neither RAG-based search functionality nor memorization (when triggered through adversarial prompts) fit comfortably within the contours of copyright infringement. These phenomena involve either non-expressive uses or user-driven circumvention of guardrails, and AI developers’ tools continue to demonstrate substantial non-infringing uses that weigh against secondary liability. It will be all the more intriguing to see how the case unfolds, especially since Justice Amit Bansal noted during the proceedings that he would not rely on foreign precedents and intends to assess the issues solely through the lens of the Indian Copyright Act.

I would like to thank Priyam Mitra for sharing his notes of the last ANI v OpenAI hearing.

I would like to thank Swaraj Barooah and Praharsh Gour for their valuable inputs and review of this post.

Anonymous

April 15, 2026 at 7:48 pm

Interesting post! RAG also takes out one of the most important and strong arguments that could be advanced for training as infringement argument. As Adv. Adarsh Ramanujan had argued that curation of datasets for pre-training containing copyrighted works constitutes “storage” for the purpose of section 14 and hence infringement under Section 51. This argument convinently bypasses the non expressive use argument, by locating the act of dataset curation for pre training under section 14.

With RAG even that argument will fail. Since there is no curation of dataset nor further storage. Since an GenAI model while RAGging relies on a external database that is already stored, hosted in the internet and available for conversion into matrixes and fodder for Gen AI inference.

Shivam Kaushik

April 18, 2026 at 4:02 pm

Amazing post Vishno! RAG is an outlier in the context of the arguments that are paraded in the AI-Copyright debate. Some parallels about RAG can be drawn to what Google does in the name of Indexing webpages. It not only indexes but also reproduces enough material that at time people can decide on the basis of that material if they even want to go on to/browse a certain web-page. Is that not copyright infringement? Courts don’t think that is the case.

Anonymous on In the Aftermath of Parle: Unsettling Questions for Trade Mark Law
Thank you for your kind comments. Even if Parle’s mark is removed from the Register, it does not affect its…
Mustafa Safiyuddin on In the Aftermath of Parle: Unsettling Questions for Trade Mark Law
Since Avon is the prior applicant,isnt Parle's registration vulnerable to an attack for removal on the basis that under section…
Anonymous on In the Aftermath of Parle: Unsettling Questions for Trade Mark Law
Just as a granted patent may be revoked for non‑working, trademark law too should evolve to address the imbalance created…
Anonymous on Remembering Mr. Guruswamy “Nutty” Nataraj (1970-2026)
Dear Anon, I checked right now. The link is taking me to the correct document. I can understand if you…
Anonymous on Remembering Mr. Guruswamy “Nutty” Nataraj (1970-2026)
The link seems to reflect a different document. Can you upload the correct document at the link or share a…

ANI v OpenAI: Not Everything an LLM Does is Copyright Infringement

The Web Search Functionality & RAG:

Memorization:

Contributory Infringement for Memorized Infringing Outputs?

What About the Users?

Conclusion:

About The Author

SpicyIP

2 thoughts on “ANI v OpenAI: Not Everything an LLM Does is Copyright Infringement”

Leave a CommentCancel reply

The Web Search Functionality & RAG:

Memorization:

Contributory Infringement for Memorized Infringing Outputs?

What About the Users?

Conclusion:

About The Author

SpicyIP

2 thoughts on “ANI v OpenAI: Not Everything an LLM Does is Copyright Infringement”

Leave a CommentCancel reply

You may also like:

Discover more from SpicyIP