Over the last few years, we have seen more enterprises open to adopting public cloud and migrating at least some of their data sets.
Understanding how data can be protected in the cloud is a common question from teams less familiar with this technology. In this two-part blog post, we will first build an understanding of Personally Identifiable Information (PII) then look at some aspects of protecting this sensitive data in the cloud, especially from a Serverless perspective.
Types of Data
While we will focus on PII data, many of the ideas presented here can apply to other types of sensitive data. Knowing the types of data your organization has, classifying them, setting a sensitivity level and having policies to address requirements and cloud approval are essential first steps in moving any data to the cloud.
While data classifications may vary across countries and generations, the Centre for Internet Security uses ‘sensitive’, ‘business confidential’, and ‘public’ for high, medium, and low sensitivity levels. Laws protect high-sensitivity data, medium-sensitivity data is intended for internal use, while low-sensitivity data is used to denote publicly accessible information.
With internal alignment on types of data and levels of sensitivity, an organization can classify each data set within the organization. Then, a policy can be established for each type-level combination that sets out requirements for moving this data to the cloud.
Application teams can then more easily find out if their application data can be stored in the cloud and determine the specific service configuration requirements to do so in a compliant way.
Personally identifiable information (PII) data – a definition
As the name suggests, PII is any data that can be used to identify a specific person. The data may be used by itself or combined with other data sets in order to determine an individual’s identity. Information from a driver’s license, banking details, government ID number, and medical records are considered sensitive PII. Meanwhile, publicly available data such as a person’s race, gender, or date of birth are considered non-sensitive PII.
Note that the definition of PII varies from one country to another. The European Union’s General Data Protection Regulation (GDPR) defines personal data as “any information that relates to an individual who can be directly or indirectly identified”. The definition covers names, email addresses, location information, ethnicity, gender, biometric data, religious beliefs, IP address, political opinions and even pseudonyms or usernames that are easily linked to a person.
In Singapore’s Personal Data Protection Act (PDPA), PII includes “any (digital) information that can be used to decipher the owner’s identity.” In this definition, personal email and mailing addresses, contact numbers, and credit card information are examples of PII. Business email addresses are actually exempt from PII in PDPA – one of the significant differences with GDPR. Read more about some of the differences between PDPA and GDPR in our earlier article here.
The importance of PII data
PII is distinct from other types of data in that it can be used to pinpoint a person’s identity. As mentioned, there are two types of PII: sensitive and non-sensitive. Sensitive PII is the type of information that, when leaked, could lead to an individual’s harm. Thus, sensitive PII needs to be especially protected, and fines for loss of this data may be considerably higher.
More than preserving company secrets or other business-related communications, significant fines and possible reputational harm resulting from a breach and loss of personal data are key motivators for organizations to handle personal data securely. As a case in point, Google was not spared from GDPR in 2018. France’s Commission Nationale de l’Informatique et des Libertés (CNIL, National Commission on Informatics and Liberty) issued Google with a fifty million Euro fine for giving vague and generic descriptions of important details such as their legal basis for data processing and their data retention period, among other violations. The CNIL publicized the sky-high fine to emphasize the severity of Google’s infringements, and serve as warning to others.
Though many users still patronize Google, some have explored options like DuckDuckGo, which promises never to track users nor store personal information. Interestingly, DuckDuckGo’s popularity surges whenever major data breaches occur, such as Edward Snowden’s 2013 exposé, and the Cambridge Analytica scandal. This shows that many citizens, especially privacy-oriented ones, will choose companies that handle their data properly and abandon companies that are seen to be careless, vague or manipulative. As such, it pays to be genuine, proficient and vocal about your organization’s approach to privacy.
Key concepts definition
Before we delve more intimately into the common requirements of privacy regulations and how we can address them, let’s first define a couple of key concepts.
Shared responsibility is a common concept adopted by all cloud service providers. It is an approach to cloud security that expects both parties – the cloud provider and the account owner – to be responsible together for the security of the cloud account and the data in it. For example, the cloud provider is responsible for physical access to their data centers, while the account owner is responsible for access permissions to the cloud account itself as well as to any databases in the account. These responsibilities are clearly defined, and together are critical to the security of the cloud environment.
On a high level, the more abstracted or managed a cloud service is, the more responsibility sits with the cloud provider. With servers in the cloud, the provider’s responsibility encompasses physical access and most of the network. The organisation is responsible for firewall configuration, access permissions and everything in the virtual servers, such as patching of operating systems and software to ensure their security.