When reading the theory I never realized the importance of understanding ADLS, for me it has always been the place where storage is done. There is no denying that ADLS is indeed a storage space within Azure but it does provide different variations to its storage. This becomes important when thinking in terms of architectural design and so definitely very important when considering from an interview perspective. Just like integration run time, there has been no interview where I was not asked questions on ADLS tiers. You can be asked direct questions or indirect which is basically more situational. So, lets deep dive into ADLS tiers.
In ADLS, basically, there are three tiers of storage available –
- Hot tier
- Cold tier
- Archive tier
The difference between these tiers comes in terms of the duration of the data and the frequency of access.
HOT ACCESS TIER
This is the tier in which data that is frequently accessed is stored. It is in this storage that either the data that has to be further processed is stored or the one that has to be moved into cold tier stored. If you are wondering if we can store all our data in the hot access tier, it is actually possible. However, since the cost of storage is highest in this tier, it is not preferred. But the cost of accessing data is lowest in this one.
COLD ACCESS TIER
This tier has data that is stored for a minimum of 30 days, if data is deleted before the 30 days, cost has to be paid for that. When comparing this to the hot access tier, it has lower storage cost but higher access cost. It is often used in scenarios when we are taking a backup of the data or storing the data while more data is getting accumulated for further processing. Often the data that has become old but not that old, which would be required quickly is stored in this tier.
ARCHIVE ACCESS TIER
This is the tier which stores data for a minimum of 180 days. This storage has a very low storage cost, but a high retrieval cost. For accessing the data that is present in the archive access tier, one has to first rehydrate the data, bring it in the online access tier and then read or modify it. This however, does not mean that you lose your entire control of the data. You still have it, however, it is limited and you basically have access to only the metadata of the blob present. It is in this tier, that often the raw data is moved into or long backups are done.
The questions related to tiers are not often of the form of what are they, but rather how to use them, in which way will the plan for storing this data will be implemented. In next few post coming up, we will be discussing on it.
Happy Learning π
Leave a comment