Self-Service – Data Democratization, Governance, and Security
By Sheila Simpson / June 8, 2021 / No Comments / Amazon AWS Exams, Azure and AWS, Azure Synapse and Its ETL Features, Microsoft Exams, Tools and Examples
Self-Service
Self-service is at the heart and core of the democratization of enterprise data. This is possible through managing metadata.
Azure time series insights, Azure ML (machine learning), AI (artificial intelligence), and Azure stream analytics are some of the tools in the Azure tech stack that business users can utilize for self-service in data science.
Metadata is the information that describes the characteristics and usage of a particular data asset. What something is, what it means, how and where it is used, where and when it came from, and how accurate and current it is are all aspects of metadata. For example, source metadata consists of system of record, physical files/databases, copybooks, procedures, parameters, schedules, transactions, and data flows. Data integration metadata consist of data transaction logic, conversion matrix and cycle, reformatting rules, business rules, reconciliation rules, and extraction history.
Data Catalog and Data Sharing
Data catalog is defined as the inventory of organized presentation of metadata that consists of glossaries, dictionaries, data models, lineage, data quality metrics, and data usage. The purpose of a data catalog in the context of data sharing is to provide a standardized understanding of data definitions and business definitions in a single place so as to interpret the data in a standardized way. The owners of the data catalog can add, delete, or change registered data assets. A clear ownership framework is a prerequisite for success. The data catalog can be shared and integrated with other systems through REST APIs.
Data dictionary is the system catalog of databases for IT or data engineering teams to understand physical data assets. It holds technical metadata, such as names, attributes, keys, indexes, formats (length, type, etc.), valid values in columns, default values in columns, and relationship of fields within table as well as with other tables.
The purpose of a business glossary is to create clear, consistent, and standard meanings of business terms, key indicators (KPIs), and formulas of these KPIs across the enterprise, as well as to help everyone arrive at the same understanding. The business glossary contains a collection of business terms in the language familiar to business users, with definitions, contexts, examples, and variations across business domains.
Figure 6-2 demonstrates that a data catalog consists of two categories: data dictionary (technical metadata) and business definitions (business metadata).
Figure 6-2. Data catalog
For the democratization of data, a data catalog makes it easy to discover data sources and eases the understanding of data. This resolves the challenges related to maintenance and discovery of information assets. Data catalogs add value by making it easy for users to find patterns, meaning, and a sense of the data.
There are multiple tools available in the market by cloud vendors or for on-premises servers. Some examples of tools found within the Azure cloud platform technical stack are as follows:
• Azure Data Catalog: Enables data catalog inside an enterprise
• Azure Data Share: Enables management and sharing of data outside the enterprise. It also allows everyone within or outside the business to contribute their insights.
Figure 6-3 demonstrates the relationship between types of metadata and the class of the metadata in an organization that includes metadata of source, data integration, data warehouse, metrics and reporting, and reference data.
Figure 6-3. Types and classes of metadata