The rise of the Internet has led to the creation of vast repository of data residing in across servers and domains. This vast repository contains a large datasets that includes “publicly available information.” This publicly available information includes – time sensitive information – news, financial information and data, reviews, auction information, and in multiple other categories. Because the information is public, and because current information technology tools that gave rise to the internet in the first place, it has become extremely convenient to extract as required – this publicly available information.
The process of automatic content extraction from publicly available servers is usually referred to as data extraction / scraping / harvesting. The only cost to extract the data is the cost of the computer system and time required to program it to extract data. Hence this content extraction at times becomes extremely lucrative to deal with data sets and their resale, usually for time sensitive information. Some data security service providers estimate that up to 40% of a websites traffic comprises data extractors. The same data security service providers also suggest that websites actively try to stop data extraction because of the heavy toll it takes on their computational resources – Servers can be slowed down and bandwidth soaked up by the extractors scouring every webpage for data.
The data extraction process creates legal issues and concerns for both sides of this issue−those who want to extract data, and those who want to protect against extraction of data. This post provides a basic background on the laws applicable in the case of data extraction in India, and provides an overview of remedies available to both the content creator and content extractor.
Consider the following hypothetical scenarios except where indicated (images): In all of these scenarios, freely available content could be taken from the site of a content provider, and then used for re-sale. In the first scenario, freely available financial data is taken from a major content provider, and re-packaged for sale with a fee. The second scenario involved an actual dispute between Craigslist and Padmapper – Padmapper created an interface that took data from Craigslist and then provided the same on its interface (see link). The third scenario may involve getting data from various betting sites, and repacking it, and then selling to consumers. The fourth (again an actual scenario) involved getting fare data from a travel website, and then re-selling it.
Copyright laws: Copyright Act, 1957 Data extraction involves copying, and hence copyright laws are first ones that are analysed. Under Section 2 (o) of the Copyright Act, 1957, defines data compilation (or a data set) as a “literary work”. Section 14 of the Copyright Act, 1957 further grants several exclusive rights in favour of the copyright holder (content creator) as the first owner of such copyrighted works (the data compilation / data set) namely: a. Right to reproduce data including storing it by any electronic means; b. Make copies of data; c. Adapt data; d. Communicate data to the public; and e. Translation of data
Section 51 of the Copyright Act further provides that a copyright is “deemed to be infringed” if any of the above enumerated rights under Section 14 are contravened without the permission of the copyright holder in the course of trade.
However, there are two areas that should be ascertained before determining infringement. Ownership, and no fair use exception. It is only the copyright holder / content owner can raise a claim. Hence in the case of a content aggregator – for various users, it is the users who own the copyright and not the content aggregator. This scenario occurs for websites where users generate the content – and the website is merely organizing the display / formatting of the content. Section 52 of the Copyright Act lists various exceptions to copyright and care should be taken that the content extracted has not been used under the purposes outlined for fair dealing.
Information Technology Act, 2002, as amended (“IT Act”): Section 10A of the IT Act provides for Validity of contracts formed through electronic means – Where in a contract formation, the communication of proposals, the acceptance of proposals, the revocation of proposals and acceptances, as the case may be, are expressed in electronic form or by means of an electronic record, such contract shall not be deemed to be unenforceable solely on the ground that such electronic form or means was used for that purpose.
Accordingly clickwrap, browsewrap and other means of contract formation on the internet are covered under this clause. And most websites provide services to consumers under either of these means for contract formation. For example, if a person has to accept the terms of service, by clicking “I Agree” or typing in “I Agree” – it is commonly known as a clickwrap agreement. Under a browsewrap agreement, a user may continue to use / browse a content owners website and consent of the user to the terms of the website are implied because the user continues to browse the website. In India, there are no judicial precedents involving a browsewrap or clickwrap agreement / contract.
Section 43 of the IT Act provides for a penalty in case a computer system is damages. Section 43 also provides the relevant definitions to assess damage. The parts relevant to data extraction are reproduced and highlighted below:
43. Penalty for damage to computer, computer system, etc.- If any person without permission of the owner or any other person who is in charge of a computer, computer system or computer network, (a) accesses or secures access to such computer, computer system or computer network; (b) downloads, copies or extracts any data, computer data base information from such computer, computer system or computer network including information or data held or stored in any removable storage medium; (c) Introduces or causes to be introduced any computer contaminant or computer virus into any computer, computer system or computer network; (d) damages or causes to be damaged and computer, computer system or computer network, data, computer database or any other programmes residing in such computer, computer system or computer network; (e) disrupts or causes disruption of any computer, computer system or computer network; (f) denies or causes the denial of access to any person authorised to access any computer, computer system or computer network by any means; (g, h)….
Explanation. For the purposes of this section: (i) “computer contaminant” means any set of computer instructions that are designed – (a) to modify, destroy, record, transmit data or programme residing within a computer, computer system or computer network; or (b) by any means to usurp the normal operation of the computer, compute system, or computer network; (ii) “computer database” means a representation of information, knowledge, facts, concepts or instructions in text, image, audio, video that are being prepared or have been prepare in a formalised manner or have been produced by a computer, computer system or computer network and are intended for use in a computer, computer system or computer network; (iii) “computer virus” means any computer instruction, information, data or programme that destroys, damages, degrades adversely affects the performance of a computer resources or attaches itself to another itself to another computer resources and operates when a programme, date or instruction is executed or some other even takes place in that computer resource; (iv) “damage” means to destroy, alter, delete, add, modify or re-arrange any computer resource by any means.
Section 66 of the act provides a punishment for a term extending to three years, or a fine of Rupees Five Lacs, or both for the acts referred to in Section 43.
In a case where data is extracted, there are, according to the provisions of Section 43, the following infractions: (a) Accessing or securing access to: computers, computer systems or computer networks; (b) Downloading from, copying or extracting data, data base information from computers, computer systems or computer networks;
However, what is problematic is clause (c) as in the absence of any guideline, an argument could be made that repeated access from a computer system to a content owners database / databases overloads the content owner’s database system and computer systems hosting that database. This repeated access could be defined as a computer contaminant or computer virus. In addition, if a content owner has to separately provision additional server space, or devote additional severs / resources to cater to the content extractor, then the content extractor could be considered to be a computer contaminant / virus as the actions of the content extractor degrade the performance of the servers of the content owner.
It is expected that once Indian Courts are seized of such a matter involving data extraction, they may issue certain guideposts that help in determining whether servers are overburdened, or whether performance of content server is degraded. In issuing such guideposts, Courts may consider what other jurisdictions are doing. For example, the United States under the Computer Fraud and Abuse Act, provides a minimum amount of damages of at least $ Five thousand ($5,000) over a one-year period. 18 U.S.C. §1030(a)(4).
A technological measure to thwart would be data extractors could be to alter the Robots file disallowing automated bots to crawl the content owners website. Another could be to use the CAPTCHA technology to distinguish between individual access and bot based access for data extraction.