Malamud’s New TDM Venture May Not be Shielded by Section 52 of the Copyright Act

Picture from Wikipedia

Carl Malamud is at it again, riling up copyright owners and setting the stage for yet another lawsuit on a fascinating copyright issue. His latest effort to ‘liberate’ copyrighted information from scientific publishers is an enterprising venture, called the ‘JNU Data Depot’ which is based out of the Jawaharlal Nehru University (JNU) and which was the subject of an excellent report by Priyanka Pulla in Nature. In the past, Malamud has provoked a firestorm of a debate in multiple jurisdictions, including India, with regard to the accessibility of legally binding standards whose copyright is owned by private standard setting bodies. A few months ago, the New York Times published an editorial backing Malamud in his litigation against legal publishers, that is to heard by the United States Supreme Court in its next term.

The JNU Data Depot

The ‘JNU Data Depot’ consists of 73 million journal articles (576 terabytes of information), which were compiled by Malamud from an undisclosed source and stored on a few disk drives which he has housed in an ‘air gap’ facility in JNU, which means that it cannot be accessed through the internet. Researchers have to visit the facility in person. Malamud’s reluctance to disclose the source of his papers does not sit well with his stated commitment to Gandhian principles of resistance and the complete truth.

Malamud intends to make available the JNU Data Depot available to scientific researchers for text and data mining (TDM). Simply put, TDM is the practice of using automated tools to analyse large amounts of text or data to track useful trends or patterns. It has the potential to dramatically improve the quality of scientific research. Most publishers of journals do offer API access for subscribers wanting to conduct TDM research but they cannot match the scale of the JNU Data Depot because individual publishers can provide the TDM facility for only their journals and not for journals owned by other publishers. With the JNU Data Depot, researchers will now be able to conduct specific searches over the 73 million articles. It is not clear how much information will be shown in the search snippets. Will it be Google Books style snippets or an entire page of a journal article? Depending on how much information is being displayed in the search results, researchers may require access to the entire journal article, which I presume they will access through either an authorized subscription or through Sci-Hub, which is the pre-eminent database of pirated academic papers. The papers on Sci-Hub however are not searchable and are usually accessed through their DOI identifiers. Sci-Hub combined with the JNU Data Depot facility would be a potent alternative for those lacking access to the expensive databases of individual publishers. If the JNU Data Depot is deemed to be legal under Indian law, Indian universities should seriously reconsider their subscriptions to the very expensive academic databases.

Can the Depot be Defended under the Copyright Act?

The question now is whether JNU and Malamud are liable for copyright infringement? The depot after all hosts 73 million articles without the permission of the copyright owners. Even if the full text of the articles is not being made available to visitors, the fact of the matter is that the very act of copying 73 million articles constitutes copyright infringement unless Malamud and JNU can establish a clear defense under Section 52 of the Copyright Act. The fact that only snippets will be shown helps their case although we have no idea of the size of these snippets. However even then it is up to Malamud and JNU to reveal the provision of law which they intend to defend a possible claim of copyright infringement.

According to Malamud’s statement to Nature, since the JNU Data Depot is showing only snippets of the paper, it would not be liable for copyright infringement under American copyright law. Malamud maybe right about the position of law in the US given the unique ‘fair use’ provision in American law which in the past has protected Google Books. However, as the Nature article rightly points out, American law is irrelevant because the depot is located within India and will be governed by Indian copyright law. Indian law is very different on the point. Unlike the free wheeling American ‘fair use’ provisions which only lay down a set of principles that are to be applied in various scenarios, Indian law lays down specific uses which are exempted from copyright infringement.

Private and Personal Use?

The second defence of Malamud’s data depot has been put forth by Arul George Scaria in a lengthy and wonderfully articulated post yesterday on SpicyIP. The long and short of Arul’s defence of the data depot rests on the “fair dealing” limitation articulated in Section 52(1)(i) for the purposes of “private or personal use, including research”. He argues as follows:

“The Indian fair dealing provision specifically includes private or personal uses, including research, as a purpose for which the fair dealing provision is applicable. The access restrictions explicitly put in by the data depot indicates that the users of the facility will be using it only for research purposes, and only in their personal capacity. As the activity in question is for a purpose specifically mentioned in the provision, the second requirement is also met.”

While Arul maybe right in saying that the researcher using the facility maybe covered under the exception on the grounds that it is ‘personal use’ (although it is debatable), it is clear as day that neither Malamud or JNU can use this defence because the data depot is sitting in a public university, with an invitation for any researcher to use. This is the definition of ‘public use’. If the data depot does not qualify as either ‘private’ or ‘personal’ there is simply no point in conducting a ‘fair dealing’ analysis because the main requirements of the provision, regarding nature of the use, are not met.

Neither Malamud nor Arul have identified any other provision of the Copyright Act that can provide a safe harbor for the JNU data facility.

JNU’s Liability

Separate from Malamud, there is also the question of JNU’s liability. Unlike Malamud, JNU has multiple subscription contracts with academic publishers for legitimate access to their databases. By hosting the data depot facility, there is a possibility that JNU is in violation of those contracts and may incur significant liability if the publishers decide to initiate legal action under those contracts. Since most of these contracts have arbitration clauses requiring arbitration to take place under foreign laws and in foreign jurisdictions, JNU could be looking at a pretty hefty legal bill. However, it makes little sense for publishers to take the trouble of initiating legal action as long as Sci-Hub continues to be accessible in India. It makes little commercial sense to take down the JNU data depot when Sci-Hub, which is far more used in India, continues to thrive. A study in 2016, revealed that Indians made 3.4 million downloads from Sci-Hub. Only the fear of laches may spur the publishers to file a lawsuit.

Prashant Reddy

T. Prashant Reddy graduated from the National Law School of India University, Bangalore, with a B.A.LLB (Hons.) degree in 2008. He later graduated with a LLM degree (Law, Science & Technology) from the Stanford Law School in 2013. Prashant has worked with law firms in Delhi and in academia in India and Singapore. He is also co-author of the book Create, Copy, Disrupt: India's Intellectual Property Dilemmas (OUP).


  1. Rohit

    Webscrapping or text mining or data extraction is technology which still in the grey zone as far as legality of its functioning is concerned. However, in my humble opinion most jurisdictions world over are willing to make exemptions for educational and non-commercial research use.


Leave a Reply

Your email address will not be published. Required fields are marked *