Architecture

dhw

1. 01 Introduction and Goals

DHW (DataHandwerk) is designed to support BI and DWH developers working with the Microsoft BI platform (MS SQL Server).

Handwerk statt Massenproduktion. - Craft instead of mass production.
MSSQL based tools for data craftsmen. - Individual customization instead of assembly line.

Handwerk = craft
Handwerker = craftsmen

Craft is the term used to describe numerous commercial activities that manufacture products mostly to order or provide services on demand. Craft activities are contrasted with industrial mass production.

Als Handwerk werden zahlreiche gewerbliche Tätigkeiten bezeichnet, die Produkte meist auf Bestellung fertigen oder Dienstleistungen auf Nachfrage erbringen. Die handwerkliche Tätigkeit steht der industriellen Massenproduktion gegenüber.

The project’s key stakeholders are DWH developers who see their work as a craft.

  • A toolkit is created for these data craftsmen and gradually filled and expanded with tools.

  • The data craftsmen are given the possibility to work faster and better with the toolkit and to automate parts if (and only if) they wish to do so.

  • Tools and methods can be used individually and independently optional. They do not have to be used in a mandatory way.

  • The toolbox should always be perceived as an enrichment and aid, gladly as a supportive, never as a constricting corset.
    A developer is not undesirably restricted by the use of tools and methods. He allows himself to be partially and consciously restricted if the advantages subjectively outweigh the disadvantages for him.

1.1. Requirements Overview

  1. Combination of manual work and the use of tools

    • The combination of handwork without tools and the use of tools is easily possible. There must be no one-way street.

    • A developer does not become a slave to the tools or the methods, but can at any time, like a craftsman, decide for himself to what extent he wants to use which tools to support him, or not.

  2. mssql and Azure features

    • It must always be possible for BI developers to use all mssql and Azure features. So ultimately also Azure Synapse Analytics, Data Factory, BLOB Storage and other Azure products.

    • Whenever possible or desired, a developer may prefer to use mssql features to realize goals, rather than proprietary solutions. Tools should simplify their use.

      • temporal tables

      • graph databases

      • …​

  3. a minimally invasive repository supplements database metadata

    • Functions and methods that are absent or insufficient in a database are supplemented via a repository (per database), which is connected via synonyms to system catalog views of the DWH database.

    • A repository is also a database: the repository database.

    • This repository does not need and should not contain all metadata to fully define a database, but only necessary or useful additions.

    • A repository is externally connected to a DWH database, with minimal invasive interventions in this database.

    • these minimal invasive interventions can be undone

    • for these minimal invasive interventions (according to current conception) extended properties are used, which can be assigned to many database objects and also columns, parameters etc. and also removed again.

      • by using unique guid it is possible to copy objects including the additional metadata stored in the extended properties into other databases.

  4. Independence from immediate support

    • One should be able to work even if a tool does not work as expected. There shall be no blocking dependency on a tool or a method that requires support fire department interventions, but there shall always be a manual way out.

  5. Documentation

    • The repository, how it works, and the methods are clearly and thoroughly documented. Good documentation

      • provides transparency

      • creates understanding

      • facilitates independent work

      • makes it easier to get started

      • reduces dependence on support

  6. Modular structure and expandability

    • The toolbox concept allows the toolbox to be extended with modules (tools) by different users.

    • Examples of possible modules:

      • Relation Management

        • virtual PK and FK

      • Persistence and Historization Assistant

      • Documentation Wizard (including creation and inheritance of descriptions and Friendly Name)

      • Data Lineage Visualizer

      • Import Wizard

      • SSIS Generator

      • Data Factory Generator

      • Data Vault Wizard

      • DSGVO Wizard

      • SSAS Tabular Generator

      • Easy Refactoring

      • …​

1.2. Quality Goals

  1. Fixing bugs is more important than implementing new features.

  2. A feature is only as good as its documentation.

1.3. Stakeholders

Table 1. Stakeholders
Role/Name Expectations

dwh developer

  • wants to work faster, more efficiently and more effectively

  • wants to automate boring work optionally

  • wants to decide for himself what he automates and what not

  • does not want to become a slave of the tools

dwh user

wants a well documented DWH

1.4. Alternatives

1.4.1. Data Warehouse Automation, repository based database generators

There are commercial tools that define the structure and function of relational and analytical databases via a repository in order to generate the code of these databases.

These tools are one-way: repository ⇒ target database.

One can work very successfully with these tools if one accepts the methods and approaches provided.

You can compare these tools with rail networks. As long as you are satisfied with these networks and the places connected to them, you can travel more or less effectively and efficiently with one or the other railroad in one or the other rail network. Problems arise when places or routes are missing or the connection between places becomes more cumbersome than one would like. Therefore, roads, trucks and cars also have great justification and importance. Sometimes there are even only footpaths.

My subjective praise of AnalyticsCreator can be found here (in German): Warum ich seit 2017 den AnalyticsCreator verwende.

2. 02 Architecture Constraints

  1. The project is based on Microsoft SQL Server (mssql):

    • current restriction: SQL Server 2017 or above.

      • initial restriction: SQL Server 2019 or above.
        because features of the Graph database were used, which are only available in SQL Server 2019.

      • Features of graph database are no longer used due to insufficient support by Visual Studio database projects and by DACPAC containing data. It should therefore be possible to use SQL Server 2017 (not tested).

  2. It must always be possible for BI developers to use all mssql and Azure features in DWH databases.

  3. a minimally invasive repository database supplements DWH database metadata

    • A repository database is externally connected to a DWH database via synonyms to system catalog views of the DWH database.

    • Changes can be made both in repository and in DWH database. Synchronization is required for this.

    • for additional minimal invasive interventions extended properties are used, which can be assigned to many database objects and also columns, parameters etc.

    • The synchronization concept takes into account that renaming and refactoring can be performed in the repository as well as in the DWH database.

    • The DWH database works without the repository and all traces of a repository can be removed (extended properties can be dropped).

    • It is possible to connect to the DWH database in read-only mode, without writing back and using extended properties. In this case, the database can be documented, for example.

    • SSAS Tabular databases can only be documented. This requires at least compatibility level 1200, since from this version on the model definition is expressed in JSON.

    • SSIS projects can only be documented.

3. 03 System Scope and Context

3.1. System Landscape

structurizr-SystemLandscape

3.2. DataHandwerk Development Context

structurizr-DataHandwerkDevelopment-SystemContext
structurizr-DataHandwerkgit-SystemContext

3.3. Customer Context

structurizr-DWHBDevelopment-SystemContext
structurizr-DWHBgit-SystemContext
structurizr-DWHBProduction-SystemContext

4. 04 Solution Strategy

  • The toolkit should fit into the usual working environment of a DWH developer. Therefore, the toolkit is based on repositories in the form of SQL Server databases. A developer can work with these databases the same way he works with other databases: with SSMS, Azure Data Studio = ADS, Visual Studio database projects and other tools.

  • There is one (1) repository per target DWH.

  • Repository database and DWH database are separate databases so that a DWH database is not unnecessarily filled up with ballast. This is especially important when the structure of the database is transferred to a production system.

  • all repository databases have the same structure, they differ per DWH database only in content. This simplifies maintenance and upgrades.

  • Repository databases are connected via synonyms on system views and procedures of the DWH database. These system views always have the same structure (within an SQL Server version). Synonyms can be easily deleted by script and created with reference to another database.

  • The current development is based on SQL Server version 2019. One reason is the improved support of graph databases. According to the current concept, the dependencies of objects should also be stored in graph databases.

  • It seems reasonable to add missing functionality of standard SQL Server tools (SSMS, ADS, VS) via one or more different additional GUI.

    • better input of multi-line strings

    • linking of tables during data entry (filters and lookup tables)

    • graphical interactive display of references (dependencies)

5. 05 Building Block View

5.1. Whitebox Overall System

structurizr-DataHandwerkDevelopment-Container
structurizr-DWHBDevelopment-Container
structurizr-DWHBProduction-Container

5.2. Level 2

5.2.1. DWH db Development (Project B)

structurizr-DWHBDevelopment-DWHdbDevelopmentProjectB-Component

5.2.2. Repository db Development (Project B)

structurizr-DWHBDevelopment-RepositorydbProjectB-Component

5.2.3. GUI (Project B)

structurizr-DWHBDevelopment-GUI-Component

6. 06 Runtime View

6.1. [repo].[usp_main]

ProcedurePumlTreeTable-ExecutionLog-usp_main
dhw-sequence-usp_main

6.1.1. [repo].[usp_sync_guid_RepoSchema]

dhw-sequence-usp_sync_guid_RepoObject

6.1.2. [repo].[usp_sync_guid_RepoObject]

dhw-sequence-usp_sync_guid_RepoObject

6.1.3. [repo].[usp_sync_guid_RepoObjectColumn]

dhw-sequence-usp_sync_guid_RepoObjectColumn

6.1.4. [reference].[usp_update_Referencing_Count]

dhw-sequence-usp_update_Referencing_Count

6.1.5. [repo].[usp_index_inheritance]

dhw-sequence-usp_index_inheritance

6.1.6. [repo].[usp_RepoObjectColumn_update_RepoObjectColumn_column_id]

6.1.7. [repo].[usp_GeneratorUsp_insert_update_persistence]

usp_GeneratorUsp_insert_update_persistence

6.2. <Runtime Scenario 2>

6.3. …​

6.4. <Runtime Scenario n>

7. 07 Deployment View

7.1. Infrastructure Level 1

<Overview Diagram>

Motivation

<explanation in text form>

Quality and/or Performance Features

<explanation in text form>

Mapping of Building Blocks to Infrastructure

<description of the mapping>

7.2. Infrastructure Level 2

7.2.1. <Infrastructure Element 1>

<diagram + explanation>

7.2.2. <Infrastructure Element 2>

<diagram + explanation>

…​

7.2.3. <Infrastructure Element n>

<diagram + explanation>

8. 08 Cross-cutting Concepts

8.1. <Concept 1>

<explanation>

8.2. <Concept 2>

<explanation>

…​

8.3. <Concept n>

<explanation>

9. 09 Design Decisions

10. Q10 uality Requirements

10.1. Quality Tree

10.2. Quality Scenarios

11. 11 Risks and Technical Debts

12. 12 Glossary

Term Definition

<Term-1>

<definition-1>

<Term-2>

<definition-2>


13. About arc42

arc42, the Template for documentation of software and system architecture.

By Dr. Gernot Starke, Dr. Peter Hruschka and contributors.

Template Revision: 7.0 EN (based on AsciiDoc), January 2017

© We acknowledge that this document uses material from the arc 42 architecture template, https://arc42.de, https://arc42.org/ Created by Dr. Peter Hruschka & Dr. Gernot Starke.