Introduction to Azure Data Factory - Azure Data Factory (2023)

  • Article
  • 8 minutes to read

APPLIES TO: Introduction to Azure Data Factory - Azure Data Factory (1)Azure Data Factory Introduction to Azure Data Factory - Azure Data Factory (2)Azure Synapse Analytics

In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn't have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.

Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

Usage scenarios

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers.

To analyze these logs, the company needs to use reference data such as customer information, game information, and marketing campaign information that is in an on-premises data store. The company wants to utilize this data from the on-premises data store, combining it with additional log data that it has in a cloud data store.

To extract insights, it hopes to process the joined data by using a Spark cluster in the cloud (Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure Synapse Analytics to easily build a report on top of it. They want to automate this workflow, and monitor and manage it on a daily schedule. They also want to execute it when files land in a blob store container.

(Video) Azure Data Factory | Azure Data Factory Tutorial For Beginners | Azure Tutorial | Simplilearn

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Additionally, you can publish your transformed data to data stores such as Azure Synapse Analytics for business intelligence (BI) applications to consume. Ultimately, through Azure Data Factory, raw data can be organized into meaningful data stores and data lakes for better business decisions.

How does it work?

Data Factory contains a series of interconnected systems that provide a complete end-to-end platform for data engineers.

Introduction to Azure Data Factory - Azure Data Factory (3)

This visual guide provides a detailed overview of the complete Data Factory architecture:

To see more detail, select the preceding image to zoom in, or browse to the high resolution image.

Connect and collect

Enterprises have data of various types that are located in disparate sources on-premises, in the cloud, structured, unstructured, and semi-structured, all arriving at different intervals and speeds.

(Video) Azure Data Factory | Azure Data Factory Tutorial For Beginners | Introduction to Azure Data Factory

The first step in building an information production system is to connect to all the required sources of data and processing, such as software-as-a-service (SaaS) services, databases, file shares, and FTP web services. The next step is to move the data as needed to a centralized location for subsequent processing.

Without Data Factory, enterprises must build custom data movement components or write custom services to integrate these data sources and processing. It's expensive and hard to integrate and maintain such systems. In addition, they often lack the enterprise-grade monitoring, alerting, and the controls that a fully managed service can offer.

With Data Factory, you can use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis. For example, you can collect data in Azure Data Lake Storage and transform the data later by using an Azure Data Lake Analytics compute service. You can also collect data in Azure Blob storage and transform it later by using an Azure HDInsight Hadoop cluster.

Transform and enrich

After data is present in a centralized data store in the cloud, process or transform the collected data by using ADF mapping data flows. Data flows enable data engineers to build and maintain data transformation graphs that execute on Spark without needing to understand Spark clusters or Spark programming.

If you prefer to code transformations by hand, ADF supports external activities for executing your transformations on compute services such as HDInsight Hadoop, Spark, Data Lake Analytics, and Machine Learning.

CI/CD and publish

Data Factory offers full support for CI/CD of your data pipelines using Azure DevOps and GitHub. This allows you to incrementally develop and deliver your ETL processes before publishing the finished product. After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure Cosmos DB, or whichever analytics engine your business users can point to from their business intelligence tools.


After you have successfully built and deployed your data integration pipeline, providing business value from refined data, monitor the scheduled activities and pipelines for success and failure rates. Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Azure Monitor logs, and health panels on the Azure portal.

Top-level concepts

An Azure subscription might have one or more Azure Data Factory instances (or data factories). Azure Data Factory is composed of below key components.

(Video) Introduction to Azure Data Factory

  • Pipelines
  • Activities
  • Datasets
  • Linked services
  • Data Flows
  • Integration Runtimes

These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.


A data factory might have one or more pipelines. A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task. For example, a pipeline can contain a group of activities that ingests data from an Azure blob, and then runs a Hive query on an HDInsight cluster to partition the data.

The benefit of this is that the pipeline allows you to manage the activities as a set instead of managing each one individually. The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel.

Mapping data flows

Create and manage graphs of data transformation logic that you can use to transform any-sized data. You can build-up a reusable library of data transformation routines and execute those processes in a scaled-out manner from your ADF pipelines. Data Factory will execute your logic on a Spark cluster that spins-up and spins-down when you need it. You won't ever have to manage or maintain clusters.


Activities represent a processing step in a pipeline. For example, you might use a copy activity to copy data from one data store to another data store. Similarly, you might use a Hive activity, which runs a Hive query on an Azure HDInsight cluster, to transform or analyze your data. Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.


Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.

Linked services

Linked services are much like connection strings, which define the connection information that's needed for Data Factory to connect to external resources. Think of it this way: a linked service defines the connection to the data source, and a dataset represents the structure of the data. For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account. Additionally, an Azure blob dataset specifies the blob container and the folder that contains the data.

Linked services are used for two purposes in Data Factory:

(Video) Azure Data Factory Tutorial | Introduction to ETL in Azure

  • To represent a data store that includes, but isn't limited to, a SQL Server database, Oracle database, file share, or Azure blob storage account. For a list of supported data stores, see the copy activity article.

  • To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive activity runs on an HDInsight Hadoop cluster. For a list of transformation activities and supported compute environments, see the transform data article.

Integration Runtime

In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the bridge between the activity and linked Services. It's referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from. This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.


Triggers represent the unit of processing that determines when a pipeline execution needs to be kicked off. There are different types of triggers for different types of events.

Pipeline runs

A pipeline run is an instance of the pipeline execution. Pipeline runs are typically instantiated by passing the arguments to the parameters that are defined in pipelines. The arguments can be passed manually or within the trigger definition.


Parameters are key-value pairs of read-only configuration.  Parameters are defined in the pipeline. The arguments for the defined parameters are passed during execution from the run context that was created by a trigger or a pipeline that was executed manually. Activities within the pipeline consume the parameter values.

A dataset is a strongly typed parameter and a reusable/referenceable entity. An activity can reference datasets and can consume the properties that are defined in the dataset definition.

A linked service is also a strongly typed parameter that contains the connection information to either a data store or a compute environment. It is also a reusable/referenceable entity.

(Video) Azure Data Factory | Introduction to Azure Data Factory

Control flow

Control flow is an orchestration of pipeline activities that includes chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger. It also includes custom-state passing and looping containers, that is, For-each iterators.


Variables can be used inside of pipelines to store temporary values and can also be used in conjunction with parameters to enable passing values between pipelines, data flows, and other activities.

Next steps

Here are important next step documents to explore:

  • Dataset and linked services
  • Pipelines and activities
  • Integration runtime
  • Mapping Data Flows
  • Data Factory UI in the Azure portal
  • Copy Data tool in the Azure portal
  • PowerShell
  • .NET
  • Python
  • REST
  • Azure Resource Manager template


Which 3 types of activities can you run in Microsoft Azure Data Factory? ›

Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.

What are the interview questions for Azure Data Factory? ›

Basic Interview Questions
  • Why do we need Azure Data Factory?
  • What is Azure Data Factory?
  • What is the integration runtime?
  • What is the limit on the number of integration runtimes?
  • What is the difference between Azure Data Lake and Azure Data Warehouse?
  • What is blob storage in Azure?
Dec 9, 2022

Does ADF require coding? ›

Code-free Data Flow – Azure Data Factory enables any developer to accelerate the development of data transformations with code-free data flows. By using the ADF Studio, any developer can design data transformation without writing any code.

What is Azure Azure Data Factory? ›

Azure Data Factory is Azure's cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF.

What are the 3 types of data that can be stored in Azure? ›

Azure storage types include objects, managed files and managed disks. Customers should understand their often-specific uses before implementation. Each storage type has different pricing tiers -- usually based on performance and availability -- to make each one accessible to companies of every size and type.

Is Azure Data Factory A ETL tool? ›

With Azure Data Factory, it's fast and easy to build code-free or code-centric ETL and ELT processes.

What are the 10 most common interview questions and answers for warehouse? ›

How to answer the 10 most common warehouse interview questions
  • Tell me about yourself. ...
  • Why do you want to work in a warehouse? ...
  • Do you have any experience with heavy machinery? ...
  • Describe a time you have made a mistake at work; how did you handle it? ...
  • Can you perform to a high level of accuracy?
Aug 8, 2019

How many pipelines can an Azure data/factory have? ›

Overview. A Data Factory or Synapse Workspace can have one or more pipelines.

How many days it will take to learn ADF? ›

You can pick up on ADF like in 2 months. But when the backend data model is not designed for ADF then you cant use ADF features in straight forward way because the data model is not there. So you need to find alternate solutions which is when you need to override lot of methods and involve heavy programming.

Is Azure Data Factory easy to learn? ›

With Azure Data Factory, it is fast and easy to build code-free or code-centric ETL and ELT processes. In this scenario, learn how to create code-free pipelines within an intuitive visual environment.

Is it easy to learn ADF? ›

ADF has a steep learning curve (though you may find others telling you otherwise), it requires a different way of thinking web application development, especially difficult for developers with previous web experience: ADF is using a STATEFUL business layer.

Is Azure data/factory a database? ›

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. ADF does not store any data itself.

How many types of activities are in Azure Data Factory? ›

Data Factory supports two types of activities: data movement activities and data transformation activities. Each activity can have zero or more input datasets and produce one or more output datasets.

What is the difference between Azure and Azure Data Factory? ›

ADF helps in transforming, scheduling and loading the data as per project requirement. Whereas Azure Data Lake is massively scalable and secure data lake storage for storing optimized workloads. It can store structured, semi structured and unstructured data seamlessly.

What are the 2 types of storing the data? ›

There are two main types of digital data storage: Direct-attached storage and network-based storage. Each type can accommodate a range of devices, so we'll look at the general types first and then delve more into specific data storage devices.

How many types of databases are in Azure? ›

Azure offers a choice of fully managed relational, NoSQL, and in-memory databases, spanning proprietary and open-source engines, to fit the needs of modern app developers. Infrastructure management—including scalability, availability, and security—is automated, saving you time and money.

What are the 3 deployment modes that can be used for Azure? ›

Azure supports three approaches to deploying cloud resources - public, private, and the hybrid cloud.

What are triggers in ADF? ›

Triggers are another way that you can execute a pipeline run. Triggers represent a unit of processing that determines when a pipeline execution needs to be kicked off. Currently, the service supports three types of triggers: Schedule trigger: A trigger that invokes a pipeline on a wall-clock schedule.

Is Azure Data Factory PaaS or Saas? ›

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution for data transformation and load. ADF supports data movement between many on premises and cloud data sources.

How do I deploy Azure Data Factory? ›

Deploy the template
  1. Subscription: Select an Azure subscription.
  2. Resource group: Select Create new, enter a unique name for the resource group, and then select OK.
  3. Region: Select a location. ...
  4. Data Factory Name: Use default value.
  5. Location: Use default value.
  6. Storage Account Name: Use default value.
Oct 25, 2022

What are the top level concepts of Azure Data factory? ›

Pipeline. An ADF pipeline is the top-level concept that you work with most directly. Pipelines are composed of activities and data flow arrows.

Can Azure data/factory run Python? ›

In this quickstart, you create a data factory by using Python. The pipeline in this data factory copies data from one folder to another folder in Azure Blob storage.

What are the three 3 most important skills that a warehouse worker should have? ›

The 3 Skills Every Warehouse Worker Should Have
  • #1 Dependable. Every employer in every job type would probably list the ability to rely on their employees as their number one request. ...
  • #2 Flexible. The warehouse world isn't just about being able to lift boxes or drive a forklift. ...
  • #3 Organized.

What are the trickiest interview questions? ›

15 tricky job interview questions — and how to nail them
  • Can you tell me a little about yourself?
  • How did you hear about the position?
  • What do you know about the company?
  • What are your greatest professional strengths?
  • What do you consider to be your weaknesses?
  • What is your greatest professional achievement?

What is difference between pipeline and data flow in ADF? ›

Data moves from one component to the next via a series of pipes. Data flows through each pipe from left to right. A "pipeline" is a series of pipes that connect components together so they form a protocol.

What language does Azure Data Factory use? ›

Language support includes . NET, PowerShell, Python, and REST. Monitoring: You can monitor your Data Factories via PowerShell, SDK, or the Visual Monitoring Tools in the browser user interface.

What connects an Azure Data Factory activity to a dataset? ›

The Azure Storage and Azure SQL Database linked services contain connection strings that Data Factory uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Azure Blob dataset specifies the blob container and blob folder that contains the input blobs in your Blob storage.

What is the average salary for ADF? ›

The average ADF India salary ranges from approximately ₹13.1 Lakhs per year for a Software Engineer to ₹24.2 Lakhs per year for a Asdf.

Does Azure require coding? ›

As mentioned, platforms such as Amazon Web Services, Microsoft Azure and Google Cloud Platform offer numerous services, many of which do not require you to code.

How to practice Azure data Factory for free? ›

Unfortunately there is no SANDBOX kind of application. But you should be able to use Azure Free trial subscription of $200 credit to create a data factory and use with few activities to explore.

What are all the activities in Azure Data Factory? ›

Data Factory supports two types of activities: data movement activities and data transformation activities. Each activity can have zero or more input datasets and produce one or more output datasets.

Which three task can be performed by using Azure? ›

Which task can you perform by using Azure Advisor? Integrate Active Directory and Azure Active Directory (Azure AD). Estimate the costs of an Azure solution. Confirm that Azure subscription security follows best practices.

How many Activitys are in the Azure Data Factory? ›

Azure Data Factory has three groupings of activities which will be described and explained further in the next section of this piece.

What are the limitations of Azure Data Factory? ›

Version 2
ResourceDefault limitMaximum limit
Total number of entities, such as pipelines, data sets, triggers, linked services, Private Endpoints, and integration runtimes, within a data factory5,000Contact support.
Total CPU cores for Azure-SSIS Integration Runtimes under one subscription64Contact support.
29 more rows

What is the purpose of data Factory? ›

Data Factory in Azure is a data integration system that allows users to move data between on-premises and cloud systems, as well as schedule data flows.

What are the main concepts in Azure? ›

Important components of Microsoft Azure are Compute, Storage, Database, Monitoring & management services, Content Delivery Network, Azure Networking, Web & Mobile services, etc.

How many types of Azure functions are there? ›

There are currently four durable function types in Azure Functions: activity, orchestrator, entity, and client. The rest of this section goes into more details about the types of functions involved in an orchestration.

What are the three components of Azure? ›

A wide range of Microsoft's software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS) products are hosted on Azure. Azure offers three core areas of functionality; Virtual Machines, cloud services, and app services.

What are the 4 types of deployment cloud services? ›

There are four cloud deployment models: public, private, community, and hybrid.

What are the two basic users types in Azure AD? ›

Guest account - A guest account can only be a Microsoft account or an Azure AD user that can be used to share administration responsibilities such as managing a tenant. Consumer account - A consumer account is used by a user of the applications you've registered with Azure AD B2C.

What are the top level concepts of Azure Data Factory? ›

Pipeline. An ADF pipeline is the top-level concept that you work with most directly. Pipelines are composed of activities and data flow arrows.


1. Azure Data Factory Part 1 - Introduction about Azure Data Factory
2. 1. Introduction to Azure Data Factory
3. Azure Data Factory 2021 | Azure Data Factory Tutorial For Beginners | Azure Data Factory Tutorial
4. What is the Azure Data Factory? | How to Use the Azure Data Factory
5. Azure Data Factory: Introduction [Data Flows Series - Ep. 1]
(Pragmatic Works)
6. Introduction to Azure Data Factory [Data Flows]
(Pragmatic Works)
Top Articles
Latest Posts
Article information

Author: Clemencia Bogisich Ret

Last Updated: 01/23/2023

Views: 6256

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.