Fivetran announced the release of an API designed to push data pipeline metadata into data catalogs yesterday. By adding to the already extensive library of metadata contained in catalogs such as Collibra, Alation, and others, the API aims to improve data quality and data governance.
The metadata API is useful for analyzing data in-flight, between source and destination systems. There is also a capability for determining changes that occur in sources before data moves, which is critical for preserving regulatory compliance.
Many of these features, according to Fivetran senior product manager, are based on the fact that "what the API provides is source column to destination column mapping."
As such, it is possible to see even minor changes in table schema and naming conventions. Pairing this information with data lineage graphs assists with impact analysis, so businesses may fully understand the consequences of changes made from source to target systems via data pipelines.
Viswanathan said organizations were unable to obtain any information in the past. It was clear that they had some information, but it was very disparate. They could just summarize it: here are some Fivetran assets. Mapping the data from source to destination was never possible in the past.
Data governance is an important aspect of business.
The metadata API is suitable for organizations with established data governance processes in place, particularly those pertaining to data access, data privacy, and regulatory compliance. This resource increases the amount of transparency and monitoring needed for data governance in these channels by "helping customers understand what's going on within the pipeline," according to Viswanathan.
Fivetran is planning to extend its metadata API so users may detect schema modifications before data even moves. Security and governance teams can then observe this change in data catalogs if someone else has accidentally added a PII column to a dataset, for example. They can then prevent the person who changed the dataset from moving the data and violating compliance mandates.
Quality of data
The metadata API has also had an impact on data quality. For example, analysts may be looking at sales information in a cloud data warehouse and wonder where certain numbers came from. In this regard, it "helps you draw that line between saying this is how your data moved, this is the tool that was used, and these are the owners within the pipeline of data," Viswanathan said.
When the data catalogs that receive this metadata include data lineage graphs that allow users to clearly see this and other pertinent information, Viswanathan described a situation in which an analyst wanted to evaluate the basic data quality of revenue figures in Looker. It then went through these changes within Snowflake, and then it was exposed in Looker. So, you can trace your data back to its origin.
Management of metadata
Fivetran's metadata API extends these dimensions of data governance—and the visibility upon which they’re predicated—into data pipelines that were previously unexplored. This degree of transparency is extremely helpful for so many purposes when it comes to data governance, from regulatory compliance to access controls and data modeling.