The Committee of Experts on Non-Personal Data (NPD) Governance Framework headed by Kris Gopalakrishnan, constituted by the Ministry of Electronics and Information Technology had released its first Report in July, 2020. I had written about the Report for SpicyIP here. In December last year, a revised Report (“the Report/Revised Report” hereinafter) was released by the Committee with the goal of unlocking economic benefit from non-personal data, creating a data sharing framework, establishing community-based rights over NPD and addressing potential harms to privacy due to misuse of data (3.5, Revised Report). The Committee was constituted for the stated purpose of making suggestions to the Central Government on regulation of non-personal data (1.1, Revised Report). The Kris Gopalakrishnan Committee (“the Committee” hereinafter) received more than 1500 representations and submissions from industry bodies, civil society, independent experts and companies, based on which it released its Revised Report. The deadline for public submissions on the Revised Report was 27 January 2021.
The revised NPD framework retains the heavy-handed regulatory architecture of its initial version, albeit with a few clarifications. In Part I of this two-part post, I discuss concerns regarding possible copyright and trade secret protection over the data mandated to be shared, which the Report either overlooks or misunderstands. I also analyse the shortcomings of a community rights framework in the context of data. In Part II, I criticise the Committee’s justifications for overriding possible IP protection over NPD. I conclude by exploring alternatives to the Committee’s approach and discuss how competition law could be utilised for furthering the goals of the Committee.
Data Sharing For High Value Datasets
The revised report defines data sharing in terms of controlled access to non-personal data for certain purposes. However, these purposes are quite vague.
Firstly, ‘sovereign purpose’ is recognised as a legitimate ground to gain access to a combination of personal and non-personal data, but the validity of a request will not be scrutinised by the NPD Authority, which is a specialised regulatory body meant to enforce the NPD framework. (8.1[vi], Revised Report). The perils of this strengthening the already potent surveillance architecture of the State without accountability have been noted here. (pages 10-11)
Secondly, the Report authorises the creation of High Value Datasets (HVDs) for ‘public good’, which is once again very vaguely defined, and includes research, innovation, policy development, devising public programmes, infrastructures etc. Any organisation registered in India can request for access to data in HVDs (8.2[v], Revised Report). Using a vague public purpose as a proxy for gaining access to HVDs and then utilising the data therein for purely commercial purposes is not safeguarded against as there are no end-use restrictions upon the entity who gains access to data for a supposedly public purpose. Notwithstanding the feasibility of such restrictions, it is predictable for organisations to not just incidentally use the data shared with them for profit-seeking, but also to make requests that meet the ‘granularity’ or ‘specificity’ criteria (left undefined in the Report) solely for profit-seeking purposes under the garb of artificially contrived public good purposes.
Data sharing between two or more for-profit private entities has been done away with under the new report (8.3, Revised Report). The Revised Report groups NPD into three distinct categories based on granularity: (a) raw data, (b) aggregate data, and (c) inferred data. Sharing raw data and aggregate data is mandated while inferred data has been recognised as proprietary, and hence, exempt from the data sharing obligation.
The Report recognises that as per Section 2(o) of the Copyright Act, 1957, original compilations of data in databases are protected as literary works. This means that the manner in which the data has been selected and arranged is protected if there has been a minimum degree of creativity involved. Sui generis database protection is not recognised in India inasmuch as our Courts do not protect the mere investment of labour into collecting, aggregating and storing data (discussed on the blog here). In an attempt to get around this, the Report notes that complete raw datasets may not be collected and data sharing is mandated only for designated HVDs where the fields for data to be shared are pre-set and relatively straightforward. The Committee concludes that such extraction as per pre-determined fields would not violate database copyright (9.3[iv], Revised Report). This is an incorrect and convenient conclusion to arrive at. A subset within a dataset is not necessarily raw data. A subset in itself could also be a result of original compilation of data but the Report does not acknowledge this possibility. Data businesses that collect a wide range of data make decisions regarding its use as well as the kinds of data to be collected, and the many subsets within subsets in which it is to be sorted. These compilations as well as the underlying data therein confer competitive advantages to these businesses, protected by copyright and trade secrets.
Trade Secrets and Data Sharing Obligations
The Report notes that when data sharing would entail access to private companies’ trade secrets or other proprietary information regarding their employees/internal processes and productivity data, the same would be exempt from data sharing requirements. (8.6[i], Revised Report). Thus, the revised version insists that it does not compel businesses to share proprietary data. Many assumptions in the revised framework misunderstand the nature of data and the IP protection that it may be subject to.
Trade secrets in India are protected under contract law, or in the absence of a contract, under an equitable duty of confidence. For instance, data collected by manufacturers of a particular IoT device excluding unanonymized personally identifiable data, aggregated and compiled in automated logs and similar records can arguably be considered as trade secret. This data may be confidential or secret if it is not open to independent discovery by others, and is obtained from private goods, assets and processes. Further, it is well recognised that a trade secret may be constituted by a combination of elements, each of which by itself may be in the public domain, but where the combination, which is kept secret, grants a competitive advantage.
The Report does not seem to recognise this, and mandates that such aggregate data be shared with data trustees who will create an HVD for public access out of it. Notably, a data trustee could be any government organization or non-profit private organization, i.e., a Section 8 company/Society/Trust, which would be responsible for the creation, maintenance and data-sharing of HVDs in India. It would be overly naïve to take a simplistic view of NGOs and Trusts, which can often be riddled with transparency, accountability and misrepresentation issues, and may operate to further the interests of certain private sector entities. Private companies can set up a Section 8 company for non-apparent uses under the garb of public purpose and ultimately gain access to data that they would not have been able to acquire otherwise for their commercial purposes.
A data trustee is obligated to establish grievance redressal mechanisms and owes a duty of care to the community whose data is collected. However, this is not a straightforward process since a community rights framework based on land rights, forest rights etc. is not necessarily applicable to people who form part of a data community. These communities are not organically created and people are not even likely to know if they belong to a particular ‘data’ community.
Further, the framework also runs the risk of promoting regulatory arbitrage. It provides that mixed datasets, which constitute the majority of datasets in a data economy, that typically have inextricably linked personal and non-personal data, will be governed by the Personal Data Protection Bill instead of the NPD framework. The threshold of inextricable linkage is a subjective one and may require only showing that separation of personal and non-personal data is technically or economically unfeasible for the firm in question. It allows firms to store data in a fashion that can exempt them from being regulated under the NPD framework. Data businesses are thus, disincentivised to anonymise data if they want to prevent losing their competitive advantage by mandatorily sharing their data with data trustees.
In part II, I discuss the shortcomings in the Revised Report’s justifications for overriding IP protection and other rights over NPD. I also briefly explore alternatives to the Report’s heavy-handed regulatory architecture by turning to possible solutions within competition law.