Snowflake 101: 5 ways to build a secure data cloud

Today, Snowflake is the favorite for all things data. The company started as a simple data warehouse platform a decade ago but has since evolved into an all-encompassing data cloud supporting a wide range of workloads, including that of a data lake.

More than 6,000 enterprises currently trust Snowflake to handle their data workloads and produce insights and applications for business growth. They jointly have more than 250 petabytes of data on the data cloud, with more than 515 million data workloads running each day.

Now, when the scale is this big, cybersecurity concerns are bound to come across. Snowflake recognizes this and offers scalable security and access control features that ensure the highest levels of security for not only accounts and users but also the data they store. However, organizations can miss out on certain basics, leaving data clouds partially secure.

Here are some quick tips to fill these gaps and build a secure enterprise data cloud.

1. Make your connection secure

First of all, all organizations using Snowflake, regardless of size, should focus on using secured networks and SSL/TLS protocols to prevent network-level threats. According to Matt Vogt, VP for global solution architecture at Immuta, a good way to start would be connecting to Snowflake over a private IP address using cloud service providers' private connectivity such as AWS PrivateLink or Azure Private Link. This will create private VPC endpoints that allow direct, secure connectivity between your AWS/Azure VPCs and the Snowflake VPC without traversing the public Internet. In addition to this, network access controls, such as IP filtering, can also be used for third-party integrations, further strengthening security.

2. Protect source data

While Snowflake offers multiple layers of protection – like time travel and fail-safe – for data that has already been ingested, these tools cannot help if the source data itself is missing, corrupted or compromised (like malicious encrypted for ransom) in any way. This kind of issue, as Clumio’s VP of product Chadd Kenney suggests, can only be addressed by adopting measures to protect the data when it is resident in an object storage repository such as Amazon S3 – before ingest. Further, to protect against logical deletes, it is advisable to maintain continuous, immutable, and preferably air-gapped backups that are instantly recoverable into Snowpipe.

3. Consider SCIM with multi-factor authentication

Enterprises should use SCIM (system for cross-domain identity management) to help facilitate automated provisioning and management of user identities and groups (i.e. roles used for authorizing access to objects like tables, views, and functions) in Snowflake. This makes user data more secure and simplifies the user experience by reducing the role of local system accounts. Plus, by using SCIM where possible, enterprises will also get the option to configure SCIM providers to synchronize users and roles with active directory users and groups.

On top of this, enterprises also should use multi-factor authentication to set up an additional layer of security. Depending on the interface used, such as client applications using drivers, Snowflake UI, or Snowpipe, the platform can support multiple authentication methods, including username/password, OAuth, keypair, external browser, federated authentication using SAML and Okta native authentication. If there’s support for multiple methods, the company recommends giving top preference to OAuth (either snowflake OAuth or external OAuth) followed by external browser authentication and Okta native authentication and key pair authentication.

4. Column-level access control

Organizations should use Snowflake’s dynamic data masking and external tokenization capabilities to restrict certain users’ access to sensitive information in certain columns. For instance, dynamic data masking, which can dynamically obfuscate column data based on who’s querying it, can be used to restrict the visibility of columns based on the user’s country, like a U.S. employee can only view the U.S. order data, while French employees can only view order data from France.

Both features are pretty effective, but they use masking policies to work. To make the most of it, organizations should first determine whether they want to centralize masking policy management or decentralize it to individual database-owning teams, depending on their needs. Plus, they would also have to use invoker_role() in policy conditions to enable unauthorized users to view aggregate data on protected columns while keeping individual data hidden.

5. Implement a unified audit model

Finally, organizations should not forget to implement a unified audit model to ensure transparency of the policies being implemented. This will help them actively monitor policy changes, like who created what policy that granted user X or group Y access to certain data, and is as critical as monitoring query and data access patterns.

To view account usage patterns, use system-defined, read-only shared database named SNOWFLAKE. It has a schema named ACCOUNT_USAGE containing views that provide access to one year of audit logs.