Linked Services in ADF

When building an azure pipeline, the foremost important thing is the integration run time and you can learn more about it, in the blog Integration Runtime. Once the integration runtime has been created, the next most important thing is the linked service. If you need to understand the relation between an integration run time and a linked service it is this – an integration run time specifies which type of machine you are going to connect to. Is it going to be a cloud, is it on-premise, or is it a Microsoft SSIS package. Once the type has been specified, it is using the linked service, that you connect with the particular machine. So, for example, you want to connect with another cloud, so you use Azure/Auto Resolve integration run time, but once that has been done, using the linked service you specify which exact machine, which server, which database, you are going to connect to. What will be the type of connection, is it going to be via a managed identity or are you going to make use of Key Vault. All of this happens via the linked service.

It is very important to remember, that although integration runtimes are the first step towards creating an azure data factory pipeline, nothing much can be achieved if you do not have a linked service and then further drill down to the datasets. There are multiple of linked services that you can create via the azure data factory, which means you can connect to multiple of databases/servers via the azure data factory.

In the datafactory page, you will find something like this –

No matter which page you are in, if you are within the azure data factory page, you will definitely find the four boxes that you see here ( the house, the pencil, the circle and the suitcase). It in the pencil or the author page, that the pipeline is actually built. The circle, is where you can monitor your pipelines and the suitcase which is the manage page. So, all the linked services and the integration run times, the git configuration etc, are done in this page. When creating the linked service, click on the suitcase and then click on the Linked Services in the connections tab. The side page would change and you will find the option of creating your new linked service.

For creating a linked service, there are few things that are required –

  1. Integration Run time – This could be set to auto resolve or self hosted or SSIS integration run time depending on what you are connecting to. So, if you are connecting to cloud, choose auto resolve, it if is private network choose self hosted and if it is a SSIS package then well it will be SSIS integration run time. In case there is no IR created, you can create it on the fly as well.
  2. In the production environment, it is mostly preferred or is a practice to use the subscription when connecting with the database. It is only when someone works on anything personally, is the connection string passed in the linked service. The reason for this is that, the name of the subscription often remains the same over different environments – the only thing that changes over the time is, d or t or u or p. These are set as variables in the devops or git via which the code moves to the next environment. Since connection is made via the subscription name, the actual credentials can remain secure.
  3. Authentication type – This is also a very important aspect of creating a linked service. When you do specify the authentication type, you can choose between keyvault, sql authentication or managed identity or service principal identity. We will definitely have a post which talks about managed identity and service principal identity and azure active directory. This difference is very important when it comes to interview preparation. For now, if you choose managed identity or service principal identity, then that is it! There is no other information that is required. However, if you choose SQL authentication type, then you definitely need to specify the connection credentials which is also not the best way of doing it. The other way is the key vault way. So, if there is no managed identity between adf and the database that you want to connect to, you need to create a key vault. Creating a key vault, requires you to first create a secret which holds the credentials for connecting with the database, this secret is then passed to a key vault linked service. So, when you specify the authentication type to be a key vault, you do pass the linked service name for the key vault.

Once all the above have been specified, you can test connection, if it succeeds, create the linked service. While creating the azure data factory pipeline, the linked service is passed on and further datasets are created on it.

Understanding datasets

To brief you about what a dataset is, let us go back to the analogy of integration run time and linked services. By this time, you know that IR is your bigger circle, so the type of database you are going to connect to is specified by IR, the linked service specifies what exact database/storage you are going to connect to and then comes the dataset. It is the most granular part, so which table or which file you are going to use, is what is stored in the dataset. The best practice when creating a dataset is to parameterize it, so you have just one dataset which holds different values in case of different scenarios. This makes it more manageable. Am I not making sense? I mean just imagine you have 100 tables to copy and you are creating 100 datasets for each source and then 100 for each destination! There is no way you can manage that.

Well, for today this is all! We hope you learnt few things. Please let us know in the comment section if you have any doubts.

Happy Learning 🙂



Leave a comment