In all the interviews that I have given, there has been none, which did not ask about the integration runtime. What they are, why do we need them, what is the difference between all of these. Some questions are direct, others situational. For instance, someone can ask you how can they copy their on-premise DB to the cloud, and missing out on integration runtime on such an answer is a faux pas for sure!
WHAT IS AN INTEGRATION RUN TIME?
Integration runtime is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as dataflows and data movements. It has access to both public networks and hybrid scenarios. In simpler terms, integration run time is the person/process which allows for the data to move in Azure. So, if you are moving your data from on-premise to azure, you got to set up an integration run time, if you are moving your data from azure to azure, there is an integration run time to be set up, if you are moving from SSIS to Azure, there is an integration run time. It is more like a train and you decide from where to where you want to move, depending on it, the train is decided and movement happens. In the computing world, integration runtime defines the kind of hardware is used to execute the activities, where the hardware is physically located, who owns and maintains the hardware, and the various data stores and services the hardware can connect to.
WHAT CAN BE DONE WITH AN INTEGRATION RUN TIME?
An Azure Integration Runtime allows the following data integration capabilities across network –
- Data flow – You can execute a dataflow in managed Azure compute environment. We will see what data flow is in an upcoming post.
- Data Movement – This is the most important activity involved when talking about BI world, IR makes it easier. You can copy data across data stores in public network and private network (on-premises or virtual private network). It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
- Activity Dispatch – With an IR, you can dispatch or execute and monitor different transformation activities running on compute environments such as Azure Databricks, HDInsight etc.
- SSIS Package Execution – ADF with its specially designed IR for SSIS makes it very simple for already existing SSIS packages to move to the cloud.
When creating a linked service, which basically contains the connection strings for your data store, you need to specify the IR that will be used.
TYPES OF INTEGRATION RUNTIME
At present there are three types of integration runtimes that Azure has to offer –
- Azure Integration RunTime
- Self-Hosted Integration Runtime
- Azure -SSIS Integration runtime
Azure Integration Runtime
This IR uses infrastructure and hardware managed by Microsoft. In this case, Microsoft is responsible for the installation, maintenance, patching, and scaling and you pay for the time of usage. Azure integration runtime supports connecting to the data stores and computes services with public accessible endpoints. In ADF, when choosing the integration runtime, Autoresolve integration runtime is available for Azure. In this case, the region is set to auto-resolve, which means that ADF depending on the source, sink, or activity type, decides the location execute it. In case, you wish to run your activity in a particular location, you can always create your own IR defining the details required.
Self-Hosted Integration Runtime
Self-hosted integration runtime is capable of running data integration capabilities between a cloud data store and a private network. In this case of IR, the infrastructure and hardware managed by you. So, all the issues with patching, scaling, and maintenance are all done by you! It can access resources both in the private and public space. Since the self hosted integration runtime makes only outbound HTTP request open internet, you can install it on-premise environment behind the corporate wall or inside a virtual private network.
Azure SSIS Integration Runtime
This integration runtime mainly got into the Azure world, to lift and shift the existing SSIS packages to the Cloud. This integration runtime is managed by Microsoft, and they can access resources both in private and public networks.
THE IR LOCATION
Basic Fact – The data factory instance can be present anywhere and the IR in some other world and this is a not an issue at all!
Azure IR Location
There are two things with this IR, one is the Azure IR, in which case the azure activity will run at the region specified by you. The other is the Auto-resolve IR, in which case –
- in case of copy activity, ADF tries to detect the sink location and tries to use the IR in the same location or a location which is similar to this one. If in case, the location of the sink db is not traceable, the location of ADF is chosen.
- in case of lookup, getMetadata, data flow, delete activity, transformation activity and authoring operation, ADF uses the IR in the data factory location.
Self-Hosted IR Location
There is no location specified for the self-hosted IR and is logically registered to the data factory.
Azure SSIS IR Location
It is in this IR that choosing the right location becomes an extremely important decision.
- it does not matter if the IR location is the same as the data factory location, but what is important is that the IR location is the same as your SQL Db or the SSIS location. This way the data movement can happen without any excessive traffic handling.
- if there is already no existing azure SQL or SQL managed instance but have an on-premise source and destination, you can create an IR at the location same to the virtual network location used to connect the on-premise with the cloud!
Check out more on this at – MicrosoftDocs
WHEN TO USE WHAT?
- Copying data between two cloud stores when both use an Azure IR, ADF regionally choses the location for data transfer.
- Self hosted integration run time is used when the data movement is between cloud and an on-premise.
- SSIS Integration runtime is used for copying data from an existing SSIS to the cloud.
- In case, the data movement is happening between two private networks, the source and sink must point to the same integration run time and that IR is then used for data movement.
THE EXPERIENCE
In the projects that I have worked upon, the IR was usually the responsibility of a client-side team, but there have been a few frequent questions I have been asked.
- What is the limit on the number of IR in ADF?
- There is actually no hard limit on the number of IR we can have. However, there is a limit on the number of VM cores that can be used per subscription on SSIS packages.
- What are the types of Integration Run time?
- When do you use a specific Integration runtime?
- What steps are followed for creating an IR?
In the next post, we will see how an IR is created in ADF.
Stay Tuned!
Happy Learning 🙂

Leave a comment