title | description | author | date | file |
---|---|---|---|---|
Azure Purview Deployment Checklist |
Azure Purview Deployment Checklist |
Zeinab Mokhtarian Koorabbasloo |
05/05/2021 |
Readme.md |
Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multi-cloud, and software-as-a-service (SaaS) data. Using Azure Purview, you can easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. You can also empower data consumers to find valuable, trustworthy data.
In order to perform a successful deployment of Azure Purview including registering and scanning all your data sources in Azure so you can create a unified data governance across all your data in the organization, you need to plan carefully and consider all the required steps. In some cases, you may need to work with other teams in your organization to prepare your environment. This guide and scripts are aimed to help you to achieve this goal.
These PowerShell scripts are aimed assist Azure Subscription owners and Azure Purview Data Source Administrators to identify required access and setup required authentication and network rules for Azure Purview Account across Azure data sources.
Azure Purview Readiness Checklist is a checklist of high-level steps to guide you to plan and deploy Azure Purview as your data governance solution. The guide is divided into four phases:
- Readiness – Learn the pre-requisite tools and approaches important to all adoption efforts.
- Build Foundation – Deploy Azure Purview Accounts to establish your unified data governance model.
- Register Data Sources – Setup first landing zone and onboard initial group of data sources.
- Curate and consume data – Enable a unified Data Governance solution for data consumers using Azure Purview.
To help you prepare your environment, we have deployed the following scripts so, you can evaluate your existing environment and identify missing configurations so Azure Purview can scan data sources.
Use any of the following PowerShell-based scripts to validate your data sources readiness prior scanning data sources in Azure Purview:
- Azure-Purview-automated-readiness-checklist.ps1
- Azure-Purview-automated-readiness-checklist-csv-Input.ps1
Script | Use cases | Prerequisites | Required Permissions |
---|---|---|---|
Azure-Purview-automated-readiness-checklist.ps1 |
|
|
|
Azure-Purview-automated-readiness-checklist-csv-Input.ps1 |
|
|
|
To scan data sources, Azure Purview requires access to data sources. This is done by using Credentials. A credential is an authentication information that Azure Purview can use to authenticate to your registered data sources. There are few options to setup the credentials for Azure Purview such as using Managed Identity assigned to the Purview Account, using a Key Vault or a Service Principals.
The automated readiness checklist currently is supported for Managed Identity.
Currently, the following data sources are supported in the script:
- Azure Blob Storage (BlobStorage)
- Azure Data Lake Storage Gen 2 (ADLSGen2)
- Azure Data Lake Storage Gen 1 (ADLSGen1)
- Azure SQL Database (AzureSQLDB)
- Azure SQL Managed Instance (AzureSQLMI)
- Azure Synapse (Synapse)
You can choose all or any of these data sources as input when running the script.
Azure Blob Storage (BlobStorage):
- RBAC: Verify if Azure Purview MSI has 'Storage Blob Data Reader role' in each of the subscriptions below the selected scope.
- RBAC: Verify if Azure Purview MSI has 'Reader' role on selected scope.
- Service Endpoint: Verify if Service Endpoint is ON, AND check if 'Allow trusted Microsoft services to access this storage account' is enabled.
- Networking: check if Private Endpoint is created for storage and enabled for Blob.
Azure Data Lake Storage Gen 2 (ADLSGen2)
- RBAC: Verify if Azure Purview MSI has 'Storage Blob Data Reader' role in each of the subscriptions below the selected scope.
- RBAC: Verify if Azure Purview MSI has 'Reader' role on selected scope.
- Service Endpoint: Verify if Service Endpoint is ON, AND check if 'Allow trusted Microsoft services to access this storage account' is enabled.
- Networking: check if Private Endpoint is created for storage and enabled for Blob Storage.
Azure Data Lake Storage Gen 1 (ADLSGen1)
- Networking: Verify if Service Endpoint is ON, AND check if 'Allow all Azure services to access this Data Lake Storage Gen1 account' is enabled.
- Permissions: Verify if Azure Purview MSI has access to Read/Execute.
Azure SQL Database (AzureSQLDB)
-
SQL Servers:
- Network: Verify if Public or Private Endpoint is enabled.
- Firewall: Verify if 'Allow Azure services and resources to access this server' is enabled.
- Azure AD Admin: Check if Azure SQL Server has AAD Authentication.
- AAD Admin: Populate Azure SQL Server AAD Admin user or group.
-
SQL Databases:
- SQL Role: Check if Azure Purview MSI has db_datareader role.
Azure SQL Managed Instance (AzureSQLMI)
-
SQL Managed Instance Servers:
- Network: Verify if Public or Private Endpoint is enabled.
- ProxyOverride: Verify if Azure SQL Managed Instance is configured as Proxy or Redirect.
- Networking: Verify if NSG has an inbound rule to allow AzureCloud over required ports; Redirect: 1433 and 11000-11999 or Proxy: 3342.
- Azure AD Admin: Check if Azure SQL Server has AAD Authentication.
- AAD Admin: Populate Azure SQL Server AAD Admin user or group.
-
SQL Databases:
- SQL Role: Check if Azure Purview MSI has db_datareader role.
Azure Synapse (Synapse):
-
RBAC: Verify if Azure Purview MSI has 'Storage Blob Data Reader role' in each of the subscriptions below the selected scope.
-
RBAC: Verify if Azure Purview MSI has 'Reader' role on selected scope.
-
SQL Servers (dedicated pools):
- Network: Verify if Public or Private Endpoint is enabled.
- Firewall: Verify if 'Allow Azure services and resources to access this server' is enabled.
- Azure AD Admin: Check if Azure SQL Server has AAD Authentication.
- AAD Admin: Populate Azure SQL Server AAD Admin user or group.
-
SQL Databases:
- SQL Role: Check if Azure Purview MSI has db_datareader role.
As your data sources scope, you can select a top-level Management Group a Subscription or use a csv file which contains subset of your Subscriptions. If you select a Management Group, the readiness check script will run on all subscriptions inside the Management Group including child Management Groups. If you select Subscriptions as data source scope, the script will only run on the resources in selected subscriptions.
The following prerequisites are required to successfully run the script:
- An Azure Subscription.
- An Azure Purview Account.
- Access to Azure AD and Azure Subscriptions where Azure Purview account and data sources are deployed.
- if csv file is used, a valid user's credential must be configured inside key Vault's secrets in each subscriptions. The script will use these credentials to validate Azure Purview access to Azure SQL Servers, Azure Synapse or Azure SQL Managed Instances.
The following permissions (minimum) are needed run the script in your Azure environment:
Role | Scope |
---|---|
Global Reader | Azure AD Tenant |
Reader | Management Group or Subscription where your Azure Data Sources reside |
Reader | Subscription where Azure Purview Account is created |
SQL Admin (Azure AD Authentication) | Azure SQL Servers or Azure SQL Managed Instances |
Access to your Azure Key Vault | Access to get/list Key Vault's secret for Azure SQL DB, SQL MI or Synapse authentication |
This script requires Azure PowerShell Az Modules.
Use the steps in the following examples to run the script:
- Run the Azure-Purview-automated-readiness-checklist.ps1 script using a PowerShell console and select any of the desired options as data source types.
- Type "y" to clear context.
- Sign in to Azure AD and Azure when prompted.
- Type Azure subscription name of your Azure Purview Account.
- Type Azure Purview account name.
- Type 1 if your data sources are located in multiple subscriptions under a top-level Management. Type 2, if you are checking readiness on data sources inside a single subscription.
- Type Management Group or Subscription Name.
- Review the output report.
-
Create a csv file (e.g. "C:\Temp\Subscriptions.csv) with 4 columns: a. Column name: SubscriptionId This column must contain all subscription ids where your data sources reside. example: 12345678-aaaa-bbbb-cccc-1234567890ab
b. Column name: KeyVaultName Provide existing key vault name resource that is deployed in the same corresponding data source subscription. example: ContosoDevKeyVault
c. Column name: SecretNameSQLUserName Provide existing key vault secret name that contains Azure Synapse / Azure SQL Servers/ SQL MI Azure AD authentication admin username saved in the secret. This user can be added to a group that is configured in Azure AD authentication on Azure SQL Servers. example: ContosoDevSQLAdmin
d. Column name: SecretNameSQLPassword Provide existing key vault secret name that contains Azure Synapse / Azure SQL Servers/ SQL MI Azure AD authentication admin password saved in the secret. This user can be added to a group that is configured in Azure AD authentication on Azure SQL Servers. example: ContosoDevSQLPassword
Note: Before running this script update the file name / path further in the code, if needed.
-
Run the Azure-Purview-automated-readiness-checklist-csv-Input.ps1 script using a PowerShell console providing the following parameters: a. AzureDataType: as data source type, use any of the following options:
"BlobStorage" "AzureSQLMI" "AzureSQLDB" "ADLSGen2" "ADLSGen1" "Synapse" "All"
b. PurviewAccount: Your existing Azure Purview Account resource name.
c. PurviewSub: Subscription ID where Azure Purview Account is deployed.
-
Type "y" to clear context.
-
Sign in to Azure AD and Azure when prompted.
-
If prompted type a user name and password for SQL authentication. This user must exist inside your Azure Active Directory. the script validates whether the user have access to Azure SQL DB, Azure SQL MI and Azure Synapse dedicated pools and if so, uses this as authentication method to connect to each database and validates if Azure Purview MSI has db_datareader role assigned.
-
Review the output report.
These scripts can help you to setup Azure network and access permissions for Azure Purview account, before scanning data sources inside your Azure environment in Azure Purview.
Use any of the following PowerShell-based scripts to validate your data sources readiness prior scanning data sources in Azure Purview:
- Azure-Purview-MSI-Configuration.ps1
- Azure-Purview-MSI-Configuration-csv-Input.ps1
Script | Use cases | Prerequisites | Required Permissions |
---|---|---|---|
Azure-Purview-MSI-Configuration.ps1 |
|
|
|
Azure-Purview-MSI-Configuration-csv-Input.ps1 |
|
|
|
Azure Purview Managed Identity. One benefit of using Managed Identity for Azure Purview is that if you are using the Purview Managed Identity to set up scans, you will not have to explicitly create a credential and link your key vault to Purview to store them, however you still need to assign access to Purview MSI across your data sources and make sure data sources are reachable.
Currently, the following data sources are supported in this script:
- Azure Blob Storage (BlobStorage)
- Azure Data Lake Storage Gen 2 (ADLSGen2)
- Azure Data Lake Storage Gen 1 (ADLSGen1)
- Azure SQL Database (AzureSQLDB)
- Azure SQL Managed Instance (AzureSQLMI)
- Azure Synapse (Dedicated Pool)
This script can help you to automatically:
Azure Blob Storage (BlobStorage):
- RBAC: Verify and assign Azure RBAC 'Reader' role to Azure Purview MSI on selected scope.
- RBAC: Verify and assign Azure RBAC 'Storage Blob Data Reader role' to Azure Purview MSI in each of the subscriptions below selected scope.
- Networking: Verify and report if Private Endpoint is created for storage and enabled for Blob Storage.
- Service Endpoint: If Private Endpoint is disabled check if Service Endpoint is ON, AND enable 'Allow trusted Microsoft services to access this storage account'.
Azure Data Lake Storage Gen 2 (ADLSGen2)
- RBAC: Verify and assign Azure RBAC 'Reader' role to Azure Purview MSI on selected scope.
- RBAC: Verify and assign Azure RBAC 'Storage Blob Data Reader role' to Azure Purview MSI in each of the subscriptions below selected scope.
- Networking: Verify and report if Private Endpoint is created for storage and enabled for Blob Storage.
- Service Endpoint: If Private Endpoint is disabled check if Service Endpoint is ON, AND enable 'Allow trusted Microsoft services to access this storage account'.
Azure Data Lake Storage Gen 1 (ADLSGen1)
- Networking: Verify if Service Endpoint is ON, AND enabled 'Allow all Azure services to access this Data Lake Storage Gen1 account' on Data Lake Storage.
- Permissions: Verify and assign Read/Execute access to Azure Purview MSI .
Azure SQL Database (AzureSQLDB)
-
SQL Servers:
- Network: Verify and report if Public or Private Endpoint is enabled.
- Firewall: If Private Endpoint is off, verify firewall rules and enable 'Allow Azure services and resources to access this server'.
- Azure AD Admin: Enable Azure AD Authentication for Azure SQL Server.
-
SQL Databases:
- SQL Role: Assign Azure Purview MSI with db_datareader role.
Azure SQL Managed Instance (AzureSQLMI)
-
SQL Managed Instance Servers:
- Network: Verify if Public or Private Endpoint is enabled. Reports if Public endpoint is disabled.
- ProxyOverride: Verify if Azure SQL Managed Instance is configured as Proxy or Redirect.
- Networking: Verify and update NSG rules to allow AzureCloud with inbound access to SQL Server over required ports; Redirect: 1433 and 11000-11999 or Proxy: 3342.
- Azure AD Admin: Enable Azure AD Authentication for Azure SQL Managed Instance.
-
SQL Databases:
- SQL Role: Assign Azure Purview MSI with db_datareader role.
Azure Synapse (Synapse):
-
RBAC: Verify and assign Azure RBAC 'Reader' role to Azure Purview MSI on selected scope.
-
RBAC: Verify and assign Azure RBAC 'Storage Blob Data Reader role' to Azure Purview MSI in each of the subscriptions below selected scope.
-
SQL Servers (Dedicated Pools):
- Network: Verify and report if Public or Private Endpoint is enabled.
- Firewall: If Private Endpoint is off, verify firewall rules and enable 'Allow Azure services and resources to access this server'.
- Azure AD Admin: Enable Azure AD Authentication for Azure SQL Server.
-
SQL Databases:
- SQL Role: Assign Azure Purview MSI with db_datareader role.
As your data sources scope, you can select a top-level Management Group a Subscription or use a csv file which contains subset of your Subscriptions. If you select a Management Group, the readiness check script will run on all subscriptions inside the Management Group including child Management Groups. If you select Subscriptions as data source scope, the script will only run on the resources in selected subscriptions.
The following prerequisites are required to successfully run the script:
- An Azure Subscription.
- An Azure Purview Account.
- Access to Azure AD and Azure Subscriptions where Azure Purview account and data sources are deployed.
- if csv file is used, a valid user's credential must be configured inside key Vault's secrets in each subscriptions. The script will configure Azure Ad Authentication using these credentials for Azure SQL Servers, Azure Synapse or Azure SQL Managed Instances.
The following permissions (minimum) are needed run the script in your Azure environment:
Role | Scope | Why is needed? |
---|---|---|
Global Reader | Azure AD Tenant | To read Azure SQL Admin user group membership and Azure Purview MSI |
Global Administrator | Azure AD Tenant | To assign 'Directory Reader' role to Azure SQL Managed Instances |
Contributor | Subscription or Resource Group where Azure Purview Account is created | To read Azure Purview Account resource. Create Key Vault resource and a secret. |
Owner or User Access Administrator | Management Group or Subscription where your Azure Data Sources reside | To assign RBAC |
Contributor | Management Group or Subscription where your Azure Data Sources reside | To setup Network configuration |
SQL Admin (Azure AD Authentication) | Azure SQL Servers or Azure SQL Managed Instances | To assign db_datareader role to Azure Purview |
Access to your Azure Key Vault | Access to get/list Key Vault's secret for Azure SQL DB, SQL MI or Synapse authentication |
This script requires Azure PowerShell Az Modules.
Use the steps in the following examples to run the script:
- Run the Azure-Purview-MSI-Configuration.ps1 script using a PowerShell console and select any of the desired options as data source types.
- Type "y" to clear context.
- Sign in to Azure AD and Azure when prompted.
- Type Azure subscription name of your Azure Purview Account.
- Type Azure Purview account name.
- Type 1 if your data sources are located in multiple subscriptions under a top-level Management. Type 2, if you are checking readiness on data sources inside a single subscription.
- Type Management Group or Subscription Name.
- When prompted, provide required information to configure access management or network configuration over the data sources and defined scope.
- Review the output report.
-
Create a csv file (e.g. "C:\Temp\Subscriptions.csv) with 4 columns: a. Column name: SubscriptionId This column must contain all subscription ids where your data sources reside. example: 12345678-aaaa-bbbb-cccc-1234567890ab
b. Column name: KeyVaultName Provide existing key vault name resource that is deployed in the same corresponding data source subscription. example: ContosoDevKeyVault
c. Column name: SecretNameSQLUserName Provide existing key vault secret name that contains Azure Synapse / Azure SQL Servers/ SQL MI Azure AD authentication admin username saved in the secret. This user can be added to a group that is configured in Azure AD authentication on Azure SQL Servers. example: ContosoDevSQLAdmin
d. Column name: SecretNameSQLPassword Provide existing key vault secret name that contains Azure Synapse / Azure SQL Servers/ SQL MI Azure AD authentication admin password saved in the secret. This user can be added to a group that is configured in Azure AD authentication on Azure SQL Servers. example: ContosoDevSQLPassword
Note: Before running this script update the file name / path further in the code, if needed.
-
Run the Azure-Purview-MSI-Configuration-csv-Input.ps1 script using a PowerShell console providing the following parameters: a. AzureDataType: as data source type, use any of the following options:
"BlobStorage" "AzureSQLMI" "AzureSQLDB" "ADLSGen2" "ADLSGen1" "Synapse" "All"
b. PurviewAccount: Your existing Azure Purview Account resource name.
c. PurviewSub: Subscription ID where Azure Purview Account is deployed.
-
Type "y" to clear context.
-
Sign in to Azure AD and Azure when prompted.
-
If prompted, type a user name and password for SQL authentication. This means that the username in the Key Vault secret does not match with your Azure SQL authentication, so you can provide your administrator's credential to connect to the Azure SQL Servers. Note: other settings will not require any prompts.
-
Review the output report.