Implement A Data Modelling and Partitioning Strategy #TestSeries

What are the scenarios in which data is embeded in a document?

  1. Read or update together – data that is read or updated together is modeled as a single document.
  2. 1:1 Relationship
  3. 1:Few Relationship – an example of this is a customer having addresses. this is one to few relationship, and can be embeded together.

What are the scenarios in which you should reference the data?

  1. Read or update independently – Updates in azure cosmos DB require the entire item to be replaced. If a document has a few properties that are frequently updated along with a large number of mostly static properties, it is efficient to split the document into two.
  2. 1:Many relationship – this is true when the relationship is unbounded. Azure cosmos DB has a maximum size of 2MB. so, when the 1:many can grow exponentially, data should be referenced
  3. Many:Many relationship

What is the maximum storage size of a partition?

The maximum storage size of a physical partition is 50 GB and the maximum throughput is 10000 RU/s

What is the maximum size for a logical partition?

20 gb

What are hot partitions?

When the data is not partitioned correctly, it results in hot partitions. Hot partitions can prevent your application workload from scaling and they can occur on both storage and throughput.

What is the case, when one partition has huge amount of data stored, while the other has less one?

Storage Hot partition

What is a throughput hot partition?

Throughput can suffer from hot partitions when most or all of requests go to the same logical partition.

What should be the strategy when choosing the partition key for write heavy workload?

When you are choosing a partition key for write heavy requests, we need to choose a partition key that has high cardinality. A high cardinality means that the value of the partition key will be evenly distributed across all the partitions.

What should be the strategy when choosing the partition key for read heavy workload?

For read heavy workload, you should ensure that the queries are processed by one or limited number of physical partitions by including a where clause, or an IN operator.

What is a cross partition query?

A cross partition query is the one where the database has to check all the partitions. This is okay if the number of partitions are less, however in case of high number of partitions, this makes the query slow and expensive.

Which of the following criteria would define a good candidate to embed two entities in a single document schema?

Read or updated together

Consider the following scenario: Your company has customers across four countries/regions, with hundreds of thousands of customers for countries/regions 1 and 2, and a few thousand customers for countries/regions 3 and 4. Requests for each country/region total approximately 50,000 RU/s every hour of the day. Your application team proposes to use countryId as the partition key for this container. This is a write-heavy application with no TTL being used. Which of the following statements is true?

This could cause storage hot partitions

Why is understanding the access pattern of your application and how to use this information in your data model design important?

 

Understanding the access patterns of your application helps identify how to access your data with fewer requests.

What is used to manage referential integrity in azure cosmos DB?

change feed

Brief about change feed?

Change feed is an API that lives within every azure cosmos DB container. Whenever you want to insert or update data to azure cosmos db, change feed streams these changes to an API that you can listen to. When an event is triggered, you can use change feed to execute code that responds to the change feed.

For instance, you have two entitites, sales and customers that you are embeding together. They have the same partition key, as customer id. what should be done to embed them together?

add customerId to each document.

add a discriminator property type that has a value of customer and sales.

When are transactions allowed in azure cosmos DB?

Azure Cosmos DB supports transactions when the data sits within the same logical partition.

In what ways can transactions be implemented in azure cosmos DB?

There are two ways to implement transactions in azure comos DB –

  1. by using stored procedures
  2. using a feature called transactional batch which is available in both .NET and Java SDK

In Azure Cosmos DB, what would be the most optimum way of managing referential integrity between different containers?

Create an Azure Cosmos DB Function that periodically searches for changes done to your containers and replicate those changes to the referenced containers.

Consider the following Scenario: You’re creating an application that will save the device metrics generated every minute by your different IOT devices. You have identified two entities that you would like to collect data for, the devices and the device metrics generated by each device. You were going to create two containers for these different entities but your data engineer suggests that you place both entities, in one single container. What should you do?

Create a document with the deviceid property and the rest of the device data, add a property called type and give it the value device‘. Create a document for each metrics data collected with the devicemetricsid and deviceid properties and all the metrics data, add a property called type and give it the value devicemetrics

 

Add a new property to your device documents called metricscollectioncount and update it with the current number of metrics collection documents, when your app or Azure function adds a new metrics collection, also add 1 to the metricscollectioncount property inside a transactional batch.

Foolishly Yours,

Avantika Tanubhrt

Happy Learning 🙂



Leave a comment