Introduction to DataSecOps

In this article, I will walk through the definition of DataSecOps, as well as its main principles today. As DataSecOps is a very exciting and new mindset, these principles will inevitably transform as DataSecOps gains more experience and information.

Definition of DataSecOps

Let’s begin with a suggested definition of DataSecOps: DataSecOps is an agile, holistic, security-embedded approach to coordinating the ever-changing data and its users, aimed at delivering quick data-to-value while keeping data private, safe, and well-governed.

Let’s break down this definition in order to understand some core concepts of DataSecOps:

DataSecOps is agile. Today, data-related processes (i.e. changes in data or its accessibility) become much more frequent and more important than they used to be, and many users from different teams access the same data. As such, DataSecOps must be agile, or it would not be able to keep up with the operational changes happening with data.

DataSecOps is holistic. As data democratization processes enable more users and teams to make use of data, this shift also introduces a large number of data stakeholders in the organization. These stakeholders include data owners or stewards, many different consumers (who consume data in different ways), and, of course, stakeholders within an organization, such as data engineering, security, GRC, IT, privacy, and legal teams (to name a few). If the only team with a DataSecOps state of mind in an organization is the data engineering team, the security team, or even both, this approach will not suffice.

DataSecOps is a group sport, and everybody who is part of the data access and operations cycle should share the same mindset. In many companies, this requires a large number of teams to share the DataSecOps mindset.

A security-embedded approach means that security is intertwined with all data projects and operations. In other words, security should not only enter the picture through a security check at the end of a data project or in an annual audit. Security needs to be a top priority and should be embedded into projects from design, through inception, and continue even after they are complete through continual monitoring.

When we refer to the ever-changing data and its users, we mean that DataSecOps is relevant for data-centric companies, where data and its users undergo a lot of changes. Those changes can be new data sources and producers, enrichment processes, data transformations and movements, platform changes, and more. The users are data consumers and producers who require periodic changes to their data access policies.

And finally, the value derived from DataSecOps is a balance where an organization and its teams can deliver quick value from data, without compromising on privacy, security, and governance. Delaying data access in order to keep data safe is failing to achieve DataSecOps because it means delaying the organization from turning data into value. We need to find processes and technologies that will enable us to simultaneously have both.

The Principles of DataSecOps

Now, let’s walk through the current principles of DataSecOps.

Security Is a Continuous Part of Data Operations, Not an Afterthought

When security is an afterthought, it may result in adverse consequences. For example, let’s say that you have a data project aimed at sharing new data with customers. If security is an afterthought, you may encounter security issues later in the process once it becomes very expensive to modify your approach. You may need to go "back to the drawing board" and replace some of the infrastructure or code that you created.

In a DataSecOps mindset organization, where security is continuously a part of the data operations and of projects, security is actively engaged in the design of such projects in order to prevent such issues later down the line. Also, as security is a part of the entire process, if security issues do arise, they can be handled immediately when the implementation price is still low.

Prefer Continuous Processes over Ad-Hoc Projects

Since data today is rapidly changing, and new data objects and users spawn very frequently, ad-hoc data projects surrounding security, privacy, and governance tend to become stale quickly.

For example, a sensitive data discovery project that is performed once a year (or even once a quarter) may be risky if the frequency of changes is much higher than that. Ideally, whenever possible, such projects should either be replaced with or accompanied by continuous processes. For example, sensitive data can be discovered in an ongoing process using Satori’s software.

Separation of Environments, Testing, & Automation

In application development and deployment, it is almost unheard of for engineering teams to work without proper separation of environments (these environments are commonly separated as testing, staging, and production) and without automation testing of code and configuration changes.

Unfortunately, these practices do not always transpire in data operations. In many cases, there is simply no environment separation, and tests are performed directly in production environments. Also, automation tests are often not present, especially in terms of security.

In a DataSecOps mindset organization, there is a staging environment with the relevant security configurations, and testing is done in a continuous manner. In this way, you can, for example, test that specific roles or users do not have access to certain data (e.g. PII) which should be masked from them.

Prioritization Is Key: Focus On Sensitive Data

Since you probably do not have all of the resources you would like to have, you need to prioritize allocation of resources, tasks, and projects. In data, it is almost always the sensitive data that you should prioritize first, as security issues with sensitive data can be, well, very sensitive indeed.

A prerequisite for prioritizing sensitive data is knowing where your sensitive data is as well as what it contains (in order to prioritize within sensitive data). If you’re not sure where your sensitive data is, finding this information out would be a good place to start. Then, explore which measures you can apply to protect it, understand who has access to it, and know whether or not their access can be revoked or limited (for example by masking the sensitive data).

Data Should Be Clearly Owned

Clarity is very important when dealing with data, yet, in many cases, organizations have data objects without clear ownership. These may be old projects that are either inactive and no one "dares" to remove, or even projects that are in constant use but lack clear ownership. It is important to understand who owns such data, especially if it may contain sensitive data, and to either remove junk data or ensure the data owners are clear about who has access to this data and what happens to it (for example if it is used by other projects and teams).

Simplified & Deterministic Data Access

While we are on the subject of clarity, data access needs to be simplified and deterministic. This means that if a certain user or team requires access to a certain dataset under certain conditions (e.g. a temporary read-only access), they will get the same answer as others with the same case and receive the same reasoning. It is fine if the answer is always no, but building a clear process of allowing access to data is imperative.

For example, data scientists may be granted access to all data except for certain datasets should they request it, provided that sensitive data is masked. If possible, such clear policies can then be automated to allow self-serve data access and streamline data access (remember that DataSecOps is all about quick time-to-value).

Quick Time-to-Value Without Compromising on Security

The last principle is to enable the organization to be "data-driven" and allow for data democratization without compromising on security. We can achieve this ideal by making sure that there are clear policies and processes surrounding data access so that, when there are delays in getting access to data, they are due to "the right reasons" (i.e. an edge case that rarely happens and needs special approval) and not because there is a manual lengthy process where all data consumers need to wait for a long time to get data access.

Such bottlenecks may be data engineering teams that need to manually set data access settings or duplicate data unnecessarily.

DataSecOps Needs You!

DataSecOps is still young, but it is extremely important. If you would like to be part of DataSecOps, contact me to learn more about DataSecOps and how you can influence it.

Ben is Chief Scientist at Satori, Streamline data access and security with DataSecOps.