Data Vault: What is it and when should it be used? (2024)

Data Vault 2.0 methodology takes not only modeling technique, but provides an entire methodology for all Data Warehouse Projects.

For many years, business intelligence (BI) projects have and continue to be operating under a waterfall model. It’s defined by a long-stretched sequence of each phase that demands an exhaustive list of upfront requirements, a complete data model design followed by codifying all hard and soft business rules into ETL processes. The visualization layer is sequentially built and presented to end users for sign off – months or even years from the original start date.

Data Vault: What is it and when should it be used? (1)

Fairly often we also see teams adopt a “reduced scope” version of waterfall that aims to break large BI initiatives into smaller projects. While this helps to reduce overall complexity, this approach, when applied to BI, is still quite risky because of two primary concerns:

  • the business requirements are now changing faster than the ability to deliver;
  • and budget holders are unwilling to spend into long-term projects with no materialized short-term results.

The above reasons are why we’ve seen a shift in project methodologies from waterfall into the iterative nimble approach of agile - which recognizes and provides some answers to these issues.

Within the data analytics domain, agile alone does not address the significant challenges we encounter at the more detailed levels of Data Warehouse or BI projects. These include:

  • iterating over data modeling
  • minimizing refactoring
  • designing of ETL or ELT routines that enable rapid response to changes in business logic or new additions of data
  • an approach to gathering business requirements that will closely tie to the input required for design decisions

In response to these challenges, Daniel Linstedt, author of Building Scalable Data Warehouse with Data Vault 2.0, defines a methodology that focuses on getting the most out of agile practices with other proven disciplines and techniques to deliver what seems to be the most iterative approach to BI yet.

Contrary to popular belief, Data Vault (DV) is not just a modeling technique, it’s an entire methodology for data warehouse projects. It binds together aspects of agile, BEAM requirements gathering, CMMI, TQM, Six Sigma and Data Vault Modelling to define an approach targeted at improving both the speed and quality of BI projects. I refer to it as the “guided missile approach” since it promotes both adaptation and accuracy.

DV also encompasses agile methods on Data Warehouse project estimation and agile task sizing to determine the traditionally overlooked complexity or work effort involved across the common Data Warehouse components. At the lower levels, it also presents a very concise and iterative approach to tackling common technical deliverables (within the BI world) with new or changing feature requests. These include thought-out, repeatable, step-by-step and agile based processes to accomplish frequent tasks.

These tasks include (but are not limited to) adding data attributes, slices, new sources, augmented sources, historical tracking, deprecating sources and source structure changes at both the ETL and Modelling phases.

The DV model, in a nutshell, is a layer that exists between regular dimensional modeling (OLAP, Star Schema) and Staging that provides scaling with growing business requirements and serves to break down complexities of both the modeling and ETL. It’s composed of hubs (business entities), links (relationships) and satellites (descriptive attributes) which are modeled somewhere between the 3NF and star schema. The model is positioned inside the data integration layer of the Data Warehouse, commonly referred to as the Raw Data Vault, and is effectively used in combination with Kimball’s model.

Data Vault: What is it and when should it be used? (2)

Tip: If you are interested in understanding the model and its underlining rules, I suggest grabbing a copy of Dan’s book mentioned above.

Here is an overview of some key benefits from the Data Vault 2.0 Approach:

  • It assumes the worst-case scenario for data modeling relationships. N:M relationships between business objects to eliminate the need for updates if a 1:M turns into an M:M. Thereby requiring virtually no additional work within Data Vault when the degree of relationship changes.
  • It is designed for historical tracking all aspects of data - relationships and attributes as well as where the data is being sourced from over time. Satellites, which are similar to dimensions, operate similarly to SCD Type 2.
  • Puts forth a set of design principles & structures for increasing historical tracking performance within the Vault (PiT and Bridge). The Data Vault model is flexible enough to adopt these structures at any point in time within the iterative modeling process and does not require advanced planning.
  • Designed to logically separate spaces containing raw vs. altered data. Raw data vault is the basis for data that is auditable to source systems and the business vault provides a place for power users who need access to data one step down from the information mart.
  • Separates out soft and hard business rules into different parts of the data integration. This enforces reusability of data across multiple end uses. For example, raw data is only sourced once within the Data Vault (less re-integrating into staging) and can be fed multiple times to downstream needs.
  • For each agile iteration, the Data Vault model, which stores all the historical tracking of data, is easily extensible without having to worry about losing historical data. Also, historical tracking is stored independently from the dimensional model.
  • Data Vault 2.0 advocates hash key implementation of business keys to reduce lookups and therefore increase loading parallelization. This results in less sequential loading dependencies.
  • The Raw Data Vault is designed to be completely auditable.
  • As a whole, the processing involved with going from Staging to Star Schema & OLAP is made much more smoothly & iterative with Data Vault.
  • It provides a very thought out approach to combining data with multiple different business keys from heterogeneous data sources (a common problem with integrating data within the warehouse across multiple source systems). Business keys are not always 1:1 or in the same format.
  • The “just in time” modeling mentality is a good match with the agile approach.

While there are many advantages to Data Vault, it also has its shortcomings, such as:

  • Data Vault is essentially a layer between the information mart / star schema and staging. There is some additional overhead that comes with developing this layer both in terms of ETL development and modeling. If the project is on a small scale or the project’s life is short-lived, it may not be worth pursuing a Data Vault model.
  • One of the main driving factors behind using Data Vault is for both audit and historical tracking purposes. If none of these are important to you or your organization, it can be difficult to eat the overhead required to introduce another layer into your modeling. However, speaking from long-term requirements, it may be a worthwhile investment upfront.
  • Data Vault represents a decomposed approach to relationships, business keys and attributes and therefore the number of tables being created is high when compared to denormalized structures such as star schema. However, consider that Data Vault compliments star schema so this comparison is for contrasting purposes only. For this reason, many joins are required to view data within the DV.
  • At the time of writing this – DV resources are limited. Complex projects using DV 2.0 are not widespread information.
  • The modeling approach, in general, can be very unconventional for those who have been operating under Kimball and (less-so) Inmon’s models.

The answer depends on a few variables.

We see the Data Vault modeling as a very viable approach to meet the needs of Data Warehouse projects, where both historical tracking and auditability are two important factors.

Additionally, if relationships across business entities are constantly evolving in your data (example 1:M to M:M), Data Vault simplifies the capture of those relationships and lets you focus more on delivering real value.

If your organisation plans on storing PII data within the warehouse and is subject to GDPR, HIPPA or other regulations, Data Vault will help with data audits and traceability.

It’ll be important to take both the benefits and drawbacks listed above to help choose whether a Data Vault approach is advantageous for your use case.

Data Vault: What is it and when should it be used? (2024)

FAQs

Data Vault: What is it and when should it be used? ›

Data Vault is a method and architecture for delivering a Data Analytics Service to an enterprise supporting its Business Intelligence, Data Warehousing, Analytics and Data Science requirements. At the core it is a modern, agile way of designing and building efficient, effective Data Warehouses.

When should I use data vault? ›

If your source architecture is prone to changes, such as the addition or deletion of columns, new tables, or new/altered relationships, you should definitely implement a data vault. A 3NF model has high entity dependencies, given that entities are correlated.

What is the purpose of the data vault? ›

Data vault focuses on ensuring long-term sustainability, and it does so by compartmentalizing the data warehouse into specific types of tables, each serving a unique function. Here are the primary components in a data vault architecture: Hubs. Links.

What problems does data vault solve? ›

It is a data model that is architected specifically to meet the needs of today's enterprise data warehouses.” The DV was developed specifically to address agility, flexibility, and scalability issues found in the other main stream data modeling approaches used in the data warehousing space.

What are the use cases for data vault? ›

The Data Vault methodology can be applied to the Silver layer where data is transformed into Hubs, links and satellites. In the Gold layer, multiple data marts/data warehouses can be built as per dimensional modeling/Kimball methodology.

Why not to use data vault? ›

The disadvantages of the Data Vault

Afterwards, you have to “work back” to a dimensional data model. More knowledge required: with the Data Vault comes a 3rd modeling technique. Employees must master these techniques bit by bit. No integrity: the data in the Data Vault lacks integrity and is not always correct.

How safe is DataVault? ›

DataVault in the Press

Protects confidential information using AES encryption and advanced security features such as Touch ID, Auto-Lock and Maximum Login Attempts. Provides powerful features such as flexible templates, password generator and synchronization with mobile and desktop devices (sold separately).

What is the purpose of a vault? ›

It protects the casket from the weight of the earth and heavy maintenance equipment that will pass over the grave. It also helps resist water and preserves the beauty of the cemetery or memorial park by preventing the ground from settling.

What is the use of vault? ›

Vault provides encryption as a service with centralized key management to simplify encrypting data in transit and stored across clouds and datacenters. Vault can encrypt/decrypt data stored elsewhere, essentially allowing applications to encrypt their data while storing it in the primary data store.

What is the difference between a data vault and a data warehouse? ›

Data vaults store raw data as-is without applying business rules. Data transformation happens on-demand, and the results are available for viewing in a department-specific data mart. While a traditional data warehouse structure relies on extensive data pre-processing, the data vault model takes a more agile approach.

How much does DataVault cost? ›

Ascendo DataVault Plans and Pricing
WindowsMac
Price$26.94$19.95
Best forWindows usersMac users
Free trial30-day30-day
Platform compatibilityWeb (Chrome, Edge, Firefox), WindowsMac, Web (Safari), Windows
11 more rows

How popular is data vault? ›

Introduced by Dan Linstedt in the 1990s, Data Vault has gained popularity since. It is particularly well-suited for the automation of data integration while accommodating changes in source data structures over time.

What is the difference between data vault and data mesh? ›

Data Vault achieves scalability by integrating new data sources into the centralized architecture, allowing for centralized control. In contrast, Data Mesh facilitates scalability by enabling domains to scale their data products and services independently.

Which is a benefit of a data vault? ›

One of the main benefits of data vault is that it can handle complex and changing data sources, such as operational systems, streaming data, or external data. Data vault can capture the history and lineage of data, as well as the changes and variations in data structures and semantics over time.

What are the core business concepts of data vault? ›

Core Business Concepts are related to other ones through Natural Business Relationships (Links) and are described using Context (Satellites) entities. Any Hub entity in Data Vault can therefore have multiple active relationships to other entities through Links.

Is data vault normalized? ›

Data Vault sounds like the old relational model. It's tables are highly normalized. It has a bit of star/snowflake because there's facts (hubs) and dimensions (satellites), but relationship between tables follows essentially the relational model.

What is very good vault used for? ›

VGS provides multiple methods for minimizing the sensitive information that needs to be stored which allows customers to secure any type of data for any use-case. Tokenization is a method that focuses on securing the storage of data. This is the quickest way to get started and is free.

When should you use a data warehouse? ›

Data warehouses are commonly used primarily for combining data from one or more sources, reducing load on operational systems, tracking historical changes in data and providing a single source of truth. Today's BI applications can provide solutions for almost all of those uses by: Pulling data from multiple sources.

When should I use key vault? ›

Secrets Management - Azure Key Vault can be used to Securely store and tightly control access to tokens, passwords, certificates, API keys, and other secrets.

Should I use Coinbase vault? ›

The vault is an ideal storage tool if you wish to keep some digital assets online, but don't require quick withdrawals and want to be notified whenever there is a request to spend the funds.

Top Articles
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated:

Views: 5413

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.