The intrinsic nature and characteristics of human genetic data
Article 29 Working Party Working Document on Genetic Data has underpinned some of the characteristics which make genetic data so unique and sensitive. I.e.:[15]
- Genetic data is unique and distinguishes an individual from other individuals;
- Genetic data may also reveal information about the individual’s blood relatives (biological family), including those in succeeding and preceding generations, and carry legal implications for them;
- Genetic data can characterise a group of individuals (e.g., ethnic communities);
- Genetic information is often unknown to the individual and does not depend on the bearer's individual will since genetic data are in principle non-modifiable (it is important to recall, however, that contemporary scientific developments enable the modification of genetic data through, e.g., gene editing technologies such as CRISPR)
- Genetic data can be easily obtained or extracted from raw material although this data may at times be of dubious quality;
- Considering the developments in research, genetic data may reveal more information in the future and be used by an ever-increasing number of agencies for various purposes.
The above and other characteristics illustrate the very particular nature of human genetic data, which brings about substantial challenges in the application of data protection legislation and principles. Among other challenges, the processing of human genetic data presupposes a conflict between two essential interests which should be balanced. On the one hand, the rights and interests of data subjects whose genetic data are processed, as well as a general interest in protecting privacy and, at the same time, controlling the use of human genetic data and associated technologies so as to safeguard its lawfulness, fairness and transparency. On the other hand, interests related to the role of human genetic data for the advance of science, for medical research and other purposes which are instrumental to the progress, development and well-being of our society, notably in matters related to biology and biotechnology, health and security.
The collective dimension of human genetic data
Due to common biological genetic characteristics, the processing of genetic data may assess health risks or determine biological relationships pertaining not only to the concerned data subject but also to others, including extended family. Thus, data processing operations may not only affect the right to privacy of one individual but may also entail consequences to the privacy of a group of individuals.
This raises very serious and complex legal issues related to data confidentiality, such as the right of access to genetic information by the biological family of the data subject. In this context, it is challenging to strike a balance between the data subject’s right not to disclose (or to disclose) his/her genetic information and the potential implications and consequences of its disclosure (or non-disclosure) for the members of the family. In essence, there are two major concerns involved (and interrelated between them): i) do genetic data belong exclusively to the specific individual from whom these data are collected or, in contrast, to a group of individuals?; ii) do family members have the right to access such data in the absence of the concerned individual’s consent?
In its Article 29 Working Party Working Document on Genetic Data (hereinafter also “WP29” or “Working Party”) stated that “to the extent that genetic data has a family dimension, it can be argued that it is “shared” information, with family members having a right to information that may have implications for their own health and future life”.
This would, however, raise major implications in the interpretation and application of data protection legislation and principles. Such a solution would entail, for instance, that other family members could also be considered “data subjects” or, alternatively, albeit not being considered data subjects in relation to such data, that family members would nonetheless have the right to access and receive this information due to the importance of the interests at stake from their perspective (traditionally, the exercise of the right of access and information is generally legally contingent on the condition of data subject and is not exercised by third parties on its behalf).
The most relevant legal instruments regulating data protection have an individualistic approach to data protection principles which challenges or, at least, seems slightly incompatible with the legal issues raised by the nature and characteristics of human genetic data. It should be noted, however, that other relevant instruments, such as Recommendation R (97) 5 on the Protection of Medical Data, the Statement on DNA Sampling by the Hugo Ethics Committee and the UNESCO’s 2003 International Declaration on Human Genetic Data, have a different approach which takes into account the “significant impact on the family”, referring to the characteristics within a “related group of individuals”, as well as information concerning “family members of the individual” and going as far as stating that “special consideration should be made for access by immediate relatives”.
Legal basis, purposes and rights in the context of human genetic data processing
Considering the variety of uses which may be given to human genetic data, the vast array of information which may be extracted from its processing, as well as the quantity and quality of different purposes for which such data may be used and further processed after collection, it may be often challenging for data controllers to determine in a precise and timely manner the purposes, legal basis and data protection rules associated with such activities. For instance, the idiosyncrasies of human genetic data processing make it challenging to uphold the purpose limitation, data minimisation and accuracy principles, especially in light of the complexity and sensitivity of the information extracted from genetic data – which entails a substantial risk of misuse and/or re-use through additional analysis of the original data.
Furthermore, the determination of such elements (such as legal basis and data processing purpose) will most likely have to occur before the data processing operations take place, at a very early stage (data controllers ought to determine and clearly explain data processing details to data subjects upfront) and in terms which may hamper the task of properly defining the corresponding legal requirements. For instance, the conclusions which will be drawn from human genetic data processing, as well as the specific context and ends to which these data will be used, may not be entirely known at the time of collection (and may vary according to several dynamic factors, such as time, budget, technical capabilities, among others). This would probably be the case in the context of long-term medical and scientific research projects when they are at an early stage.
The determination of the legal basis pursuant to articles 6 (lawfulness of processing) and 9 (processing of special categories of personal data) of the GDPR is contingent on the specific purpose for which such data will be used and, in its turn, will have a significant impact on the application and relevancy of the data subject’s rights. All such elements are intertwined in some way and all of them vary substantially according to the context, operational reality and background in which human genetic data is to be processed. This means that it may be a complex challenge to determine the rights and obligations which arise from a particular data processing operation with the necessary degree of precision. This could jeopardize the fulfilment of data subjects’ rights which, in its turn, could subsequently hinder data controller’s operations.
Moreover, the specific manner in which certain principles and rights are to be applied is not always entirely clear in a human genetic data processing setting.
For instance, the data accuracy principle and the right to rectification may be challenging since the processing of genetic data often includes insights and results which may contain errors that are kept intentionally within certain margins for various purposes (e.g., in order to assess and continuously improve the efficacy and efficiency of genomic data analysis). Or, differently, the fact that certain conclusions deriving from the results of genetic data processing may be a matter of scientific opinion which depends on several factors, such as available scientific knowledge or specific scientific areas. Similarly, despite the existence of certain genetic data modification methods and technologies, e.g., gene editing and genetic engineering technologies, genetic data and the mediums in which it is kept have fundamental non-modifiable characteristics which difficult its rectification.
Non-discrimination and non-stigmatisation
The characteristics of human genetic data (such as its permanent and essentially irrevocable nature) bring about severe risks of discriminatory or stigmatising treatment, for instance, in the context of employment and insurance. In an employment scenario, e.g., the scenario in which an employer decides to fire a worker based on the fact that this particular employee has a genetic predisposition for cancer and thus might be on sick leave for several months. Likewise, in an insurance scenario, the insurance company might deny a health insurance policy based on the knowledge of the genetic predisposition of the individual to develop cancer.
Article 29 Data Protection Working Party considers that the processing of genetic data in the context of employment should be prohibited in principle. Such processing should only be authorised under exceptional circumstances.[16] The Recommendation of the Committee of Ministers on the processing of personal data in the context of employment stresses that human genetic data must not be processed for purposes related to the assessment or evaluation of the suitability/performance of employees or job applicants, independently if the concerned individual has provided his/her consent for such processing.[17]
At the same time, the Working Party considers that the processing of genetic data for insurance purposes should be prohibited in principle and only authorised in exceptional circumstances, which must be clearly foreseen in the law. The Working Party further stresses that “the use of genetic data in the insurance field could lead to an insurance applicant or members of his family being discriminated against on the basis of their genetic profile”.[18] The Recommendation of the Committee of Ministers on the processing of personal health-related data for insurance purposes sets tangible provisions with the objective of preventing genetic discrimination in the insurance context, such as the practice of predictive genetic tests and the imposition of higher insurance premiums and taxes based on a person’s increased risk according to the conclusions obtained by the processing of genetic data.[19]
Furthermore, the use of biometric and genetic data has also been observed in the creation of massive databases with genetic profiles for a range of purposes which may interfere with data subjects’ fundamental right to data protection. For instance, human genetic data may be processed by law enforcement agencies in the context of criminal investigations and is often kept at the governmental level for various purposes (even though, the processing of data for law enforcement purposes is not regulated by the GDPR,[20] but by the LED). This data may often be stored independently of the outcome or purpose of such activities. At the same time, the access to and use of these genetic profiles and data is frequently undertaken without the data subjects’ awareness or consent.
Lastly, the integration of human genetic data within other data sets enables the establishment of links between different types of personal data (such as contact details or an address) and may allow the tracking and surveillance of an individual within several different data points and with an unprecedented level of detail, potentially affecting them in ways they neither desire nor expect. For example, in genomic data applications such as direct-to-consumer genetic testing which may be further used for profiling and commercial strategies based on an individual’s genetic information combined with other personal data.
The processing of human genetic data may also “redefine family relationships, for example, by confirming or disproving paternity, locating previously unknown relatives, or identifying anonymous gamete donors. (…) Thus, when they design, conduct and discuss their research, investigators need to consider how genomic data are used and how the type of use affects whether or not the data are controlled outside the research setting as well”. [21]
The principle of prohibition of discrimination and/or stigmatization, the protection of the dignity of all humans and their rights and fundamental freedoms with respect to the processing of genetic data have been reinforced by several legal instruments, such as the recitals of the General Data Protection Regulation, the Modernised Convention 108 (hereinafter also “Convention 108 +”) and the Convention on Human Rights and Biomedicine of the Council of Europe.
Storage Duration
The issue of storage duration/data retention is yet another topic of extreme relevance in the context of the processing of human genetic data. In a nutshell, the principle of storage limitation means that personal data must not be kept for any longer than it is necessary.
In the context of medical and scientific research, however, the application of this principle is not as straightforward as one might think. Among other aspects, for two main reasons: (i) a substantial proportion of the human genetic data processed for medical and scientific research purposes is often collected in another context, such as in the clinical and healthcare context, and every so often the processing of such data constitutes a further processing activity pursuant to Article 6(4) (conformity test) or 89(3) (research exception) of the GDPR; (ii) most data used in this context undergoes some sort of digital or automated process which aims to cover the identity of the data subject – however, such processes do not always guarantee the total and irreversible anonymization of the data at stake, but rather are mere pseudonymization or encryption processes in the context of which some link with the identity of the original data subject is still kept.
Typically, genetic data collected for research purposes should be anonymous. This would ensure accordance with the storage limitation principle. However, several scientific and medical activities which make use of human genetic data require that the results of its analysis could be linked to an individual, and such link is, often, necessary to achieve the purposes of the research itself.
At the same time, the characteristics of human genetic data, such as stored DNA, could enable a permanent link to a particular person (even if this person is not directly identified through other factors such as name or gender), given that certain genetic factors are, in themselves, identifiers of a natural person (i.e., these factors may inextricably be a part of an individual’s identity, without the need of further identifying an individual through other means).
As mentioned in Article 29 Data Protection Working Party’s Working Document on Genetic Data, “according to a definition of the task group established by the Danish government to assess the need for further legislative proposals in Denmark, a bio-bank is defined as a structured collection of human biological material which is accessible under certain criteria and where information contained in the biological material can be traced back to individuals”.
Furthermore, it is important to recall that several jurisdictions foresee the obligation to conserve certain data for long periods, such as data related to clinical trials. The EU Regulation on clinical trials on medicinal products for human use[22], for instance, foresees in its article 58 the archiving of the content of the clinical trial master file by the sponsor and the investigator for at least 25 years after the end of the clinical trial - such medical files shall be archived in accordance with national law.
Similarly, according to the WP29 Working Document on Genetic Data, “the Dutch data protection authority has been confronted with situations where anonymisation or deletion of data kept in biobanks could substantially diminish the value and functions of such data bases, since the data would no longer be linked to identifiable individuals. Examples are databases for longitudinal researches, sometimes encompassing several generations, such as the cancer registration. Arguments from the field for longer retention periods should be taken into account in such cases.”
This illustrates that, in certain cases, it is challenging to balance the need for data storage and conservation with the principle of storage limitation and, at the same time, that the application of the latter in the context of the processing of human genetic data for research purposes is not always entirely clear.
A different case, however, is the storage of human genetic data for criminal investigation. The matter is not regulated by the GDPR, but by the LED. Article 5 of the LED, about “Time-limits for storage and review”, states that “Member States shall provide for appropriate time limits to be established for the erasure of personal data or for a periodic review of the need for the storage of personal data. Procedural measures shall ensure that those time limits are observed”.
According to article 11 of Convention 108 +, any exception to the principles applicable to the processing of human genetic data, including storage limitation, must constitute a necessary and proportionate measure to pursue aims provided for by law.
A leading case in this regard comes from the European Court of Human Rights. In S. and Marper v. the United Kingdom,[23] the European Court of Human Rights decided that the indefinite storage of biometric and genetic data (such as fingerprints, cellular samples and DNA profiles) after the investigation against the suspect has ended was not a necessary and proportionate measure in a democratic society.
Likewise, in M.K. v. France (2013)[24], also from the European Court of Human Rights, the Court found the violation of Article 8 of the European Convention on Human Rights as the retention of an innocent person’s data for 25 years was not necessary for a democratic society.
Pursuant to paragraph 8 of the Recommendation No. R (92) 1 of the Committee of Ministers of the Council of Europe on the Use of Analysis of Deoxyribonucleic Acid (DNA) within the Framework of the Criminal Justice System: “samples or other body tissues taken from individuals for DNA analysis should not be kept after the rendering of the final decision in the case for which they were used, unless it is necessary for purposes directly linked to those for which they were collected”. Additionally, “measures should be taken to ensure that the results of DNA analysis and the information so derived is deleted when it is no longer necessary to keep it for the purposes for which it was used. The results of DNA analysis and the information so derived may, however, be retained where the individual concerned has been convicted of serious offences against the life, integrity or security of persons. In such cases strict storage periods should be defined by domestic law.”
(Cyber)security
Bearing in mind the sensitivity and value of human genetic data, as well as that cloud computing and other data storage and sharing technologies become increasingly important for genetic data storage and analysis, such data may be highly targeted by cybersecurity attacks and external intrusion attempts. Thus, the integrity and security of the networks and technologies used in the processing of such data is a vital aspect. In this regard, the implementation of the appropriate technical and organizational measures is of the utmost importance for all stakeholders involved.
Legal fragmentation
A key underlying and cross-cutting challenge is regulatory and legal fragmentation, especially in the context of international or regional law. Despite the existence of international and European legal instruments (both binding and non-binding), the divergent and overlapping interpretation of the rules and terms introduced in international and European law may potentially render the existing rules, best practices, guidelines and standards ineffective or non-enforceable. Especially, given that human genetic data processing activities are often undertaken in a cross-border context.
In this respect, the agile development of science and technology increases the number of public and private entities engaging in human data processing activities, which in its turn leads to an overall increase in legislative and regulatory activities and, ultimately, in the emergence of national and international rules in matters of data protection in the context of human genetic data processing.
In sum, there are several international legal instruments which contain essential definitions and rules pertaining to human genetic data processing (e.g., the GDPR, Convention 108+ or Recommendation R (97) 5 on the Protection of Medical Data), and also significative powers of international and European bodies, such as the European Commission. Nonetheless, it should be noted that “the monitoring and enforcement of the application of data protection legislation falls primarily under the competence of national authorities, in particular data protection authorities and courts”.[25] Additionally, pursuant to Article 9(4) of the GDPR, Member States may provide further conditions or limitations in relation to the processing of genetic, biometric or health-related data.
Highly dynamic and fast-paced environment
As previously detailed in the section “Stakeholders”, human genetic data is increasingly relevant for a wide variety of purposes, both scientific and commercial. Developments in genetic data science and processing technologies make it substantially harder to tackle the corresponding risks and threats, as well as to adapt the current legal framework and principles to adequately safeguard the fundamental rights of data subjects. Given the ever-changing conditions of technological and scientific development, it is challenging to retain flexibility and adequacy and at the same time comply with the legal provisions and principles (which may become obsolete).
The current legal framework aims to tackle such challenges by maintaining deliberately general, flexible and broad language (both in terms of definitions and of principles). One illustration of such is the definition of genetic data in the GDPR, which is detailed but at the same time based on general terms so that it does not become irrelevant. Article 4 of the GDPR, for example, illustrates genetic data as personal data obtained from the analysis of biological samples. Per se, this definition could come short in depicting the entire reality at stake. However, the joint interpretation of Article 4 (which states “in particular”, suggesting that it is a non-exhaustive definition) and recital 34,[26] according to which genetic data are also obtained from the analysis of other elements (and not only biological samples), provides an all-encompassing conceptual definition aiming to include realities which may not be known at the present time.
Nevertheless, genetics technology has only recently started to be widely used and it is fair to assume that genetic data analysis techniques will only increase both in complexity and dimension. The agile development and swift changes in such processes, as well as the expected reduction in the costs, depict a future that may hold complex legal data protection challenges in the context of human genetic data processing. [27] Such challenges become increasingly relevant in light of new risks and threats and the necessity to provide appropriate legal safeguards and guarantees which may not be yet in place.
[15] Article 29 Data Protection Working Party, Working Document on Genetic Data, 17 March 2004, p. 4-5
[16] Article 29 Data Protection Working Party, Working Document on Genetic Data, 17 March 2004, p. 10
[17] Recommendation CM/Rec (2015)5 of the Committee of Ministers to member States on the processing of personal data in the context of employment, article 9(3).
[18] Article 29 Data Protection Working Party, Working Document on Genetic Data, 17 March 2004, p. 10
[19] Recommendation CM/Rec (2016)8 of the Committee of Ministers to the member States on the processing of personal health-related data for insurance purposes, including data resulting from genetic tests, principle 4.
[20] Article 10 GDPR.
[21] Wan, Z., Hazel, J.W., Clayton, E.W. et al. Sociotechnical safeguards for genomic data privacy. Nat Rev Genet 23, 429–445 (2022). https://doi.org/10.1038/s41576-022-00455-y
[22] Regulation (EU) No 536/2014 of the European Parliament and of the Council of 16 April 2014 on clinical trials on medicinal products for human use.
[23] Case of S. and Marper v. The United Kingdom, Applications nos. 30562/04 and 30566/04, 4 December 2008, ECHR [GC].
[24] Case of M.K. v. France, Application no. 19522/09, 18 April 2013, ECHR.
[25] European Parliament (Committee on Petitions), Notice to Members, 15/03/2019, https://www.europarl.europa.eu/doceo/document/PETI-CM-637225_EN.pdf.
[26] Christopher Kuner, Lee A. Bygrave, Christopher Docksey, The EU General Data Protection Regulation, A Commentary, Oxford University Press, 2020, p. 201.
[27] Explanatory memorandum to Recommendation CM/Rec(2019)2 of the Committee of Ministers to member States on the protection of health related data, par. 69.