Snowflake # Introduction

Snowflake is a SQL data warehouse built for the cloud and runs on amazon web services cloud. Delivered as a data warehouse as a service it handles all the following things –

Optimization
Tuning
Data Protection
Authentication
Availability
Resource Management
Software Configuration

Customers simply sign up for these and start using them.

Shared architecture is a distributed cloud computing architecture in which the disks are accessible from all the cluster nodes. In the shared nothing architecture, all the nodes have access to distinct disks. Multiple processors can access all disks directly via intercommunication network and every processor has local memory.

Snowflake combines the architectures of both the types. It consists of three separate layers :

1. Data Storage.

2. Compute

3. Services Layer

Each layer scales independently and includes built in redundancy.

How does it works?

Snowflake organizes data into logical databases containing one or more schemas. Each schema contains views and tables. Like standard SQL databases, it stores the relational data, in table columns, using standard SQL data types. It can also store the semi structured data in the form of avro or json. Both these kinds of data can be fetched using the SQL query.

Where does the data goes?

Snowflake stores the entire data into Amazon S3 cloud storage. As data is stored in the, table, snowflake coverts it into encrypted format in the private S3 buckets. The compute layer is where queries are executed from the amazon ec2. Snowflake architecture allows the user to create virtual data warehouses without performance or contention issues!! Users create virtual warehouse and specify the size of each. Each virtual warehouse processes queries sent to the users and applications. A virtual warehouse can be scaled up or down anytime. When a user resizes a virtual warehouse, all subsequent queries takes advantage of the additional resources immediately. Snowflake allows unlimited scale and concurrency without resource contention. For example, different virtual data warehouse can handle data load and querying concurrently because each virtual warehouse accesses the data storage layer. Any changes to the data become clear to the storage layer and become available to other virtual data warehouse.

The service layer is managed by the snowflake and runs on resources distributed across multiple AWS availability zones to ensure high availability.

authenticates user connection
manages session
secures data
handles virtual warehouses
manages meta data
performs query compilation and optimization

The services layer also coordinates data access and updates across all virtual data warehouses and databases ensuring that once a transaction is competed, the data warehouses see the new version of data.

Users can connect to the Snowflake using either of the client JDBC or ODBC drivers. Almost all operations in Snowflake can be performed using these clients using SQL commands. Unlike most database systems, snowflake eliminates almost all knobs and tuning parameters. With snowflake users create databases, schemas and tables, create virtual warehouses, load and query data. Snowflake handles the rest.

Lifecycle of a Query in Snowflake

Queries are sent to the service layer using any supported client or interface. This layers validates the client sending the query is authorized to access the data and perform the operation asked by the query. Snowflake then optimizes the query and the services layer sends the instructions to the virtual warehouse being used by the user. The virtual warehouses then interacts with the storage layer and sends back the results to the client.

What happens to the amount that the user pays?

Pricing is based on usage. The cost is based on the storage that the user takes after compression plus the amount of storage that is required for the computation.

Keep Learning 🙂

Snowflake # Introduction

Share this:

Leave a comment Cancel reply