17 May 2023

CI/CD for Azure Synapse Analytics

Azure Synapse Analytics, the next evolution of Azure SQL Data Warehouse, has taken the BI world by storm since originating in November 2019. Consulting firms are shifting their attention to a Serverless SQL Pool architecture which enables them to offer their clients a mature and performant logical datawarehouse, providing the possibility to store structured as well as unstructured data, for only a few pennies on the dollar (pay-for-what-you-use) as opposed to other prevalent technologies.

Azure synapse serverless architecture

However, this shift does not present itself without any challenges. Serverless SQL & Spark Pools bring a new way of cooking up syntax and best practices with regards to Continuous Integration and Continuous Deployment (CI/CD).

These days, when even Chat GPT is not able to come up with a helpful answer, you know you struck gold for a useful article. That is why, in this article we will address:

  1. Some of the most important differences between CI/CD for Azure Data Factory and CI/CD for Synapse Analytics.
  2. Insights on two examined approaches to implement CI/CD in Synapse Analytics.

CI/CD ASA versus ADF: most important differences

  1. Synapse artifacts (e.g., linked services, datasets, …)  are not Resource Manager resources! As a result, the ARM template deployment task cannot be used to deploy artifacts. Fortunately, however, the workspace deployment task can (and should!) be installed and used to mimic the ARM deployment task. In addition, ARM templates can still be used to deploy actual infra as pools and workspaces.
  2. Some very handy ADF features are currently unavailable in ASA, these include:
    • Global parameters. These are constants across a data factory and can be consumed by a pipeline in any expression. Global parameters come in handy when trying to tweak pipeline behaviors explicitly on environment base. Workarounds to mimic this in ASA are however available, such as the use of a ‘global parameters’ file that can be uploaded onto the datalake. 
    • ARM parameter configuration capabilities, which allow the parametrization of almost any artifact involved in the CI/CD process. Achieving the same in ASA currently involves (sometimes cumbersome) workarounds which entails setting up procedures and notebooks in such a way their behavior can be finetuned to the needs of different environments.
  3. The use of a Service Connections in ASA, which allows the definition of a Service Principal to form a connection between Azure Devops and the Azure subscription on which our Synapse workplace resides. 

Two approaches on implementing CI/CD for Azure Synapse Analytics

When deciding how to implement CI/CD in ASA, the main consideration can be summed up as: ‘To YAML or not to YAML’.

In other words, you should ask yourself whether you would prefer to develop your CI/CD pipeline solely based on code (i.e., ‘To YAML’) or whether to use the DevOps Releases User Interface (i.e., ‘Not to YAML’). Both come with advantages and disadvantages.

  1. Code (YAML) based integration and deployment:

In this CI/CD scenario, a YAML file contains all the necessary instructions as code for packaging the artifacts and deploying them to an environment of choice.

An example of such a YAML file can be found via the publicly available GIT repo from Ryoma Nagata (Microsoft MVP) in which we can find the ‘azure-pipelines-ci-cd-synapse-artifacts.yml’ file.

Advantages of this approach are:

  • a faster setup time (less clicking, more code),
  • versioning on the YAML (integration & deployment) file
  • parameterization (+versioning) via .json files.

A disadvantage of this approach is that you need to understand the YAML language in order to make changes to the CI/CD workflow. With no prior experience with YAML, this can be challenging.

  1. UI / Releases workflow:

In this CI/CD scenario, we will not use a YAML script to describe our CI/CD workflow. Instead, we will add agent jobs and tasks in the Azure DevOps Pipelines & Releases components (user interface).

An advantage of this is that the CI/CD workflow can be managed visually by putting tasks in the right sequence in the user interface.

Disadvantages are that this setup:

  • takes a bit more time than the code-based approach
  • versioning will also prove to be more difficult since we have no files for which we can track versions.
Synapse CI/CD approaches

Both approaches are valid alternatives to implement CI/CD on Azure Synapse Analytics. Choosing one of these boils down to your affinity and/or experience with YAML files and their perks.

Thanks for reading and enjoy your deploying!

Facebook
Twitter
LinkedIn

Subscribe and be the first to know about our new projects