Often the question arises on how can Snowflake customers secure their data at a row level, or by organization, or by some other slice of meaningful delineation. This is a very common use case for most customers especially in analytics and reporting. Join us for a conversation of best practices and hands-on coding on securing data with secure view and UDFs.
Anyone interested in the code we use during our events can check us out on GitHub.
Check out our Meetup video below to watch an overview of this event:
Other Meetups and Events to check out:
Transcript of the Meetup video:
Welcome to another Carolina Snowflake meetup group event. Today we’re going to be going over Secure Views in Snowflake. Our previous meetups we record and we put them on our YouTube channel, which you can find in the discussion of the meetup.
And if anybody is interested in those, just go check on that discussion board and there’s a link in there. Today we’ll do a quick introduction, Snowflake refresher and talk a little bit about View Security in Snowflake, kind of overview of what that is, use cases and then we’ll have a walkthrough and then we’ll have a little Q and A session for anybody that has any questions, right?
Mike is going to take it from here. For those of you that are maybe new to Snowflake, we have a little bit of an overview. But for others that are veterans, you already know this or maybe you’ve even hopefully gotten deep into many of these within your organization.
Today’s chat, regardless if you’re a long time Snowflake user or not, is for you when you think of what Snowflake is, Snowflake, the Data Cloud at the end of the day allows you to take data rather than on prem in the cloud.
Third party data, log data, IoT data, sensor data and really bring all that data together into that single cloud location. And from there you’re able to determine which users should see which data and what format should they be seeing the data.
When I say format, that is already bringing multiple data sources together. Are you providing data warehouses, data marts? Or perhaps you’re even exposing some of the transactional data to some of your different data consumers.
At the end of the day, it’s the single location. It’s the same data, whether you have that data aggregated or broken out into its own transactional format. Everyone’s using the same data at the end of the day, and you get the power of the cloud and you’re only paying for what you actually use.
Gone are the days of trying to figure out, what are my next three to five years from capacity planning and buying all of that now and then, hopefully, if we’ve guessed right, grow into that over a period of time.
A unique thought process that allows you to save money and move even faster. You’re not installing any software in order to using Snowflake, and you can choose which public cloud between Amazon, Azure and Google that you want to run Snowflake.
For example, if your organization is a large AWS shop, great. You can use the complimentary services that AWS has to offer with Snowflake running in your desired region. Again, lots of power from that perspective.
And all of that comes from an out of the box perspective with Snowflake with your annual subscription. All right, so switching gears to view security and Snowflakes. When we think about Snowflake, we have data, like Mike was saying, and then at the end of the day, some of that data can be open from a centralized point of view to.
A large number of users or it can be restricted to certain select users. And so one of the ways Snowflake allows us to do that is through something called View security. And in other tools this might be called rulebased access control RBAC.
We’ll talk about that a little bit. So really it’s this ability to restrict access to underlying data by using a view object which overlays basically a table object act that provides this transparent filtering of data to allow for data privacy.
We’ll talk more about different versions of Snowflake and different capabilities. But when we look at the basic additions of Snowflake, roll level security or View security is available really at all levels for the most part.
And then there’s another higher level called column level view security. And we won’t get into that too much today, but that’s mainly available on the enterprise and above plans for Snowflake. When we think about implementing secure views or view level security inside of Snowflake, if you think about it from a development perspective, level of effort perspective, it really is a technical piece of development that’s going to take place.
And then there’s a little bit of scale involved here, but it is flexible and there’s some other components as well. They give that ability and at the end of the day, it’s a simple method to deploy. It doesn’t involve a lot of extra work, but it will involve a little bit of maintenance on an ongoing basis, depending on if you’re changing your roles, if you’re changing you, your different type of permissive levels and so forth and so on.
So what we’re going to talk about next are just some general use cases. And so let’s dive into some use cases really quickly. So we can think about sales and marketing teams. Perhaps there’s different sales teams in the organization that need to look at the different the data differently.
And perhaps those sales and marketing teams should not have access to other teams either within the organization. Maybe you have some sort of franchising concept, you have different sales territories, anything where you can think of delineating.
Perhaps you’re implementing Snowflake, what we might call multitenancy environment or at least a type of multitenancy structure architecture where you might have different customers or clients and those different customers and clients should not see each other’s data or look into another customer or clients data.
But you want to centralize that data into a single data mart or data warehouse. So this will be a great use case for not just real up security, but view secure views marketplace and vendor data. There’s capability there.
So if you’ve been a Snowflakes marketplace, this is a concept that aligns almost one to one with the Snowflake marketplace and the companies behind providing their data to other Snowflake customers. I mentioned before any type of really departmental or hierarchical alignment of roles to permissions.
And then one that we see quite often is when we migrate systems that have been traditionally on premise but oftentimes on the cloud now. And we migrate those operational systems like Oracle EBS or PeopleSoft or Salesforce.com.
And when we migrate that into the cloud, the roles have already been established in those operational systems. So what we want to do is we want to find a way inside of Snowflake to respect those roles that still exist in those systems that are in use today.
But we want to provide it in a way that emulates it correctly, almost in harmony. And then still be able to provide an extra layer on top of that if necessary. And so that’s a typical use case that we see with a lot of different companies in the wild.
So when we think about a very general problem of securing data, let’s take a look at what we also see often, which is sales territories and sales groups within organization. So let’s walk through this example.
So if we had a sales team that’s going to be divided into regions or territories, then the sales data is typically centralized. Either an operational system and then that data that sales data is then synchronized or ETL or ELT over into a data warehouse, where, again, it’s traditionally going to be in the same database or in the same schema, same set of tables for efficiency.
But the sales team, each individual in the sales team, whether it be a manager or a field rep or something like that, or business development, they have different levels of responsibility. And typically those levels of responsibility align with levels of access.
So if you’re a manager across regions, you typically have responsibility for people that report up to you, so they’re underneath you and they might have different territories, right? So then sales team leaders should not see other sales team leaders regional territorial sales data, that could be a conflict of interest.
You might have something like different incentive plans or salaries and you don’t want to see one sales or territory region manager seeing the other regional managers data. So as you can see in a diagram on the left right, there might be a complete roll up to that regional manager or the vice president of sales, something like that, so that person can see all the data underneath.
But if an individual is only at one of those lower hierarchies, they’re. And they should only see the hierarchy level they’re at and underneath that data. Or if they’re at the lowest level, then they should only see that lowest level of data.
So if we look at the diagram here, this is pretty interesting, right? So we’re going to walk you through this. So this is an animated slide and we’re going to step through it animate by the nation. So let’s pretend we’ve got two sales managers.
We can give them whatever names we want. And one is a manager of the western region, one is a manager of the eastern region. And so then what we might have as we build it up is we have maybe a sales lead or a territory lead of the Mountain and west region.
And so that individual can see the data at the Mountain and west region and in this case from Nevada, but they cannot see the western region total data. Maybe there’s other regions like south or something like that.
So they won’t be able to see that that information. If we look at another, maybe a business development rep on the eastern region sales team that might just have the territory of Virginia. So we can see here that this business development rep will only be able to see the data for Virginia.
They won’t, he won’t be able to see anything above Virginia. So he wouldn’t be able to see like a mid Atlantic rollup or an eastern region roll up. He can’t see data for Ohio or Washington DC. As that’s probably going to fall the mid Atlantic or the eastern region.
So what’s going to happen next? So if the eastern region manager tries to see the data from the western region, that should not happen, right? We don’t want that to happen. And if the business development rep.
In Virginia wants to try to look at data from Nevada, that shouldn’t work. And if they want to look at a higher level, like in the mid Atlantic, that shouldn’t work either. So this restriction of the data as it would exist in the same table, in the same schema, same Snowflake database, this restriction can be accomplished by using secure views.
Okay, so let me switch over to a little bit of a walkthrough on how we can do that inside of Snowflake. So we’re going to take a look at some code, and I’m going to ask the team to confirm what they’re seeing so that we are making sure that we’re looking at the right thing.
We see your Snowflake worksheet. Excellent. That’s what I was hoping for. All right, so we’re going to walk through some code here real quick. Let’s see what I left over here. So we have our meet up schema here.
We don’t have anything in it right now. And so what I’m going to do is I’m just going to create a mock up schema. Schema already exists, and we see if we have any views in that schema. We do not. I’m going to create a table called W Sales Fact for our mockup data warehouse and actually let me switch my role.
Sorry. It’s where I want to be. Okay. So I’m going to go and create Sales fact. Actually, let me do one more thing. I switch to the wrong role. Let us try that again. Switch to the right rule, and we’re going to make sure we have the schema we should have that we don’t have any views.
And let’s see if we can create this fact table. Yes, fact table created. Now what we’re going to do is we’re just going to populate a little bit of sales data. Inside of the sales fact table we just created.
So it’s just, you know, hey, these are kind of SKUs and products. And, and what we’re doing here is it’s a flat table. So what we’re doing is just aligning it to a region. So you can see here Nevada, Colorado, Virginia, just like the slide showed us a few minutes ago.
So let’s go ahead and insert all those records. So we have about eleven records coming through, and we can take a look at those records, kind of move back a little bit. So we can see we have the records showing we have product name, sales regions, maybe a roll up to what team that is, or denormalized table here, what was the sales price and then maybe the quantity for each one of those transactions.
So now you can see if I’m the sales region manager or business development person, I’ve got Colorado, Nevada, I’ve got Virginia, North Carolina. So the data is all mixed into the same table. And this is very traditional.
There’s absolutely nothing wrong with modeling your data in this way. I would say probably 99% of the organizations that use any form of analytics or data warehouse data modeling will model a transactional to denormalized fact table in this way.
And I don’t care if you’re doing Data Vault or Kimball Dimensional. It doesn’t matter. You’re going to model a table this way for efficiency. And so what we want to do is, how do we determine if a user who should be in the Virginia region should only see Virginia data?
Let’s take a look at that. What we’re going to do is create what we call a reference table. And this reference table could exist in the same schema, or it could exist in a schema outside of the main schema where the fact table exists.
So we’ll run this, we’ll create it very simple. And typically we’re going to create a little bit of a more complex reference table that holds hold the security. But just know that this is really common practice in order to create a secure view within Snowflake.
So we just have a table with two columns. Now what we’re going to do is just insert and populate that table with some basic data. Now the important thing to note here is that we’re creating, creating a reference table.
And what we want to do is align the level of granularity or what they should be able to see with a role. Now mind you, this access role is a Snowflake role and this is how Snowflake secure views work.
You’re aligning the role of the users that exist in that role to the level of privilege or the level of granularity for which they should have access. I have six rows inserted in there and you can see just a list of regions here.
Virginia, California, Colorado, so forth and so on. And now what I want to do is just because this is a demo and we’re not doing a full blown out implementation, we don’t have a lot of existing roles that align to territories and things of that nature.
But you can really make sense of what we’re doing here. What I’m going to do is just assign the ability for the default public role to use the schema and be able to select from the view. And you’ll see that work here in a minute.
So I’m going to run the permission to grant the role public access to this new schema and then I’m going to grant access to the role public to basically conduct a select on all the views that get created in this example.
And notice I’m not doing future views. So I just created. Well, you’ll see in a second. And notice that I’m using the demo roll. So I’m in the demo roll and I’m granting access to the public role. So now we’re going to go and create the view.
If I break this apart just a little bit so it’s maybe a little more legible, what are we doing with this view? So let’s break this apart on line 68, creating a replacement of you very standard sequel.
But I’m adding this different into qualifier here. I’m adding the secure view. So instead of it just being a regular view, this is pretty standard ANZ SQL. I’m changing it to a secure view which becomes more let’s just call it Snowflake sequel.
Now the secure view uses pretty much all the same syntax as a normal view, create the view and I’m calling it VW Sales fact as and I’m selecting from that fact table. But what I’m doing differently here is I’m predicating on the sales region which I showed about, so Virginia, Nevada, Colorado, et cetera.
And then I’m doing a sub query where I’m pulling the one identifier which is the access level which associates to the region in the fact table. So I’m associating the region in fact table to the reference technically region in the reference table.
And then in the reference table I’m doing a predicate or filter on the access role. So it should only show me the regions where the access role is in play for the current terminal. So let me go ahead and run this views created and with the demo rule, let’s see what we have.
So I’m going to run this and you can see here I’ve done a select from my view and I can only see the sales records and detail from Virginia which is. If I just scroll up slightly here and I take a look at let’s look at our DML here.
If I take a look at where Demo Roll is in this list, then I can see the Demo Roll only has access to Virginia. I want to look at that inside the grid here, just to be a little more clarifying. We can see here that for the Demo role that I have the access to Demo role, which I have right here, which is my current role, then I can only see Virginia.
So let’s move down a little bit and let’s go ahead and change the role to public. So I’m switching the role to public. We can confirm that not only up here, but we can also confirm it by running the select.
And I can see that my role has changed to Public kernel. And now let me select from the view. Look at that. So the view does not exist or not authorized. And in this case, it’s actually not authorized.
So we did this on purpose, just as an elementary refresher. So once you grant a permission, particularly select on a set of objects, in this case, all views or an individual object, if you do it before the object is created, as we did here.
So I granted it here, but then I didn’t create it here. Then the grant doesn’t actually take effect on the object because the object wasn’t available when you ran the grant permission. So what I have to do here is switch back to Demo role, who owns the objects on the owner of that object.
You guys. I know this would be an extra security lesson, not just. Secure view lesson. And now I can run the grant permission and you can see one object is impacted. So now I can grant that. So now that now if I switch back roles, let me just actually go back here, switch back to the public role.
I can see that I’m in the public role. And now if I run this, I should be able to see I don’t have any Warehouses. Dope thought I had Compute, so I don’t have a Warehouse available, actually, to me for some reason.
But if I did run this, I would be able to see that I have access to all the ones that are in public. Let’s see if I can actually modify that here in a minute. So that is the gist of secure views. And I’m going to switch back to the slide deck and do a little bit of a recap here.
And in the time being, if I can get that Warehouse running, we can see it run all the way through with the public role access. Well, let me just switch over here. Got it running also. So we actually walked through this example.
So the user coming out of Virginia was only able to see data from Virginia. So at the hands on exercise. So very cool. Now, if we think about the steps that we took, right, really it’s a very complex.
We could make the reference tables complex as we want to make it. We can obviously model our tables the way that we want, hopefully following some level of best practice. But really what we want to do is determine which objects we want to apply security on, because not all objects we’ll.
Require row level security. We might have some tables, like dimension tables, that don’t require that level of security. But if you do apply the secure views on top of one of the access points, sometimes it makes sense to do it to almost all of them.
And there are some cases where there’s actually a need for that. But we’ve seen really a mixed set of secure views as it relates to data warehouses, data marts and the like. So your mileage may vary, but one of the first things you want to do upfront is really understand what you are filtering.
And sometimes this helps to do that at the very beginning of a project. And probably one of the near last things is just test, test, test, because you want to look at where the roles are coming from.
Do those roles that you’re applying the security to align with an operational system? If so, then it’s more of an end to end process. The users in the operational system should have very similar security, visibility experience that they have with your data and Snowflake or data warehouse and so forth.
Then also you want to document this somewhere. But ultimately you want to understand is this almost like an array of user to roles? Because you could wind up in a situation where you have a very low, low level of granularity and you start assigning roles and next thing you know you have a plethora of roles and then maintenance becomes a bear.
So drawing it out, doing whiteboarding, almost taking us to the level of first drafting this into an Excel spreadsheet, those types of things might really help. And then probably one of the last ones in testing is just again, we’d like to go back to will this be something required by an API or another system?
Might you use DBT or DBT cloud or transformation tool where you’re going to be using a certain service account that’s going to need a higher level of permission to basically see all the roles. So this is something else to take into consideration as you’re using secure views.
When we think about V security, do we really need it? Do we have to have it? It rather and no, you don’t. But there’s just a lot of cases where this is an excellent add in. When we think about clean rooms, when we think about multi tenancy, when we think about restricting access for policy reasons, it is a great add to the Snowflake platform.
And then again, if you’re looking at even more control, like PII type of data, then the higher versions allow you to do things that get to the column level, which is really important. So that you’re not doing double work, you’re not creating five different tables that just have different subsets of the data because you didn’t know how else to to restrict or model this.
So these types of tools in your toolbox really, really add to your flexibility and your capability to be fast developed quickly, get a great return on investment, and then again make the best out of the product in general.
All right, so I think at the end of the day, it’s a thumbs up for secure views and view security. All right, I’ll turn this back over. But just to throw out one thing. Looking at Snowflake, looking at the utilization of your Snowflake account and across your organization is super important.
And so we’ve actually been privy to develop this great. Platform where we can view what our daily, almost down to the minute usage of Snowflake is when we look across queries, queries that are causing problems, queries that are run on queries looking at the cost and spin of Snowflake and Snowflake accounts through DLH IO.
So if anybody is interested in taking taking a look at their Snowflake usage from an analytical perspective, definitely reach out to us, leave a comment and again, Data Lake House AO if you’re synchronizing Data, look at Data Lake House as an option.
We are growing the community on Data Lake House so if you’re interested in getting a super cool pair of socks potentially before winter is over, definitely take a look at joining us and all the cool things and all the awesome conversations we’re having on our Slack channel.
Love to have you guys join that we’ll put the link into the meetup comments and you can just click on that and join and then pick pair of socks that you guys might be interested in. We’ll send those over to you.
So I think we’ll open it up for some QNA yes. So feel free to put your question into chat here and we got some questions rolling in and so one of the first questions is are there performance considerations between a secure view and a non secure view?
That is a really good question. Ah you know, a while back we did a kind of a very light benchmark on this and I haven’t seen one from Snowflake ever. There has been some discussion on some Snowflake community boards where they say it’s just immaterial.
Think it’s it’s it’s almost not quantifiable from the perspective of any performance drawback. I mean, obviously you’re doing a sub query, but you know, your reference table is should is and should be very small when you’re looking at role to permission or level of granularity.
And because the the offset, even if you have millions, hundreds of millions of rows, you know that that lookup is so small on in theory that table is ultimately cashed out as a reference table. So I don’t think there’s any material performance concern.
Okay, perfect. Let’s see when you’re using all right, let me see if I can get this one here. This one might be too late in the day where my eyes aren’t working quite right here. So bear with me. Stumbled to this one.
So when you’re using secure views and data sharing, are there particular let me back up. How would you handle secure views and data sharing? One, should you do it? And two, if so, any pitfalls or got you when going down the data sharing route?
Yeah, I feel like this is a question on the Snowflake Administration certification exam. But yeah, at the end of the day. Right. From a data sharing perspective, we do this fairly often. The flexibility is there.
Right. So should you or should you not do it? I think it boils down to this scenario that you’re in, how you want to who’s that organization or. Or that individual or that tenant that you’re sharing that data with, and then how are you breaking it out for your share?
Right? So that’s kind of where you start on your side. So from your Snowflake account perspective and then there’s probably a couple of different ways you can go about it. I think that it does empower empower for sharing that data to use view security because you’re in complete control and then you’re just really sharing out the view, if you will.
But we’ve seen some situations where it just makes sense to spawn off a subset. So you can very easily create a share that’s being populated every, for example, 1 hour or whatever frequency from your DBT or your data form model.
Right? Let’s just go dbt Cloud. It’s a great tool. So maybe you have a frequency set up for every 2 hours. And as part of your data sharing policy, you have an SLA or an Slo of every 2 hours and your window to transform your data falls easily within that.
So then the expectation is every 2 hours you have this separate schema or a separate set of tables or even database that you’ve shared out, particularly for that tenant or that customer. And it holds then absolutely only the data they should see.
So I think your models might vary in this situation where secure views could be a great use case, but there could be other options when it comes to data sharing that might fit the bill a little bit better.
Okay, perfect. And then one final question and. Sure we’ll get some chuckles on this one, but can you help us implement secure views as we are new to it and we have a lot of HIPAA data? Absolutely. We help Snowflake customers all the time.
We are one of the top partners for Snowflake on the East Coast. We host the meetups, we do a lot of whiteboarding lunch and learns and Data for Breakfast.
Just really helping people get the most out of their Snowflake account and their Snowflake investments. And because we’ve done implementations so frequently of all sizes, we’re probably the best suited to help out Snowflake customers, whether they’re just starting off and just need to get help with the basics or they have much larger let’s.
Call it Enterprise Opportunities and challenges, large migrations, things of that nature, where they might have very much a phased or milestone based approach in their program. And they start off, perhaps, with the migration, but then they need to then come back and apply a layer of security on top of it, using secure views.
So absolutely. Just drop us a comment, set up some time to chat with you. All right, perfect. Thank you. Those are all the questions that I see come through. Thank you everyone for asking questions.
Next month we’ll be going over Apache Iceberg on Snowflake and providing general information about Apache Iceberg. And then in February, we’ll be going over Snowflake micro partitions, so we’ve got some good stuff coming up and more to come.
Also, we’re now a part of the modern data professionals meet up network, so we’ve also got a couple of groups in there too for Midwest and East Coast, where we’ll be having some in person events coming up soon.
Thanks, everybody. Have a great night!