Data Warehouses for Business: When a Simple Data Repository Becomes a Competitive Advantage
In 1986, Barry Devlin and Paul Murphy, two IBM Europe engineers, set out to solve a problem that was plaguing major corporations of the era: operational data existed, but extracting actionable management insights from it was virtually impossible. Transactional systems captured the current state of a business, yet they were incapable of answering whythings happened or what would happen next. Devlin and Murphy designed an internal architecture for IBM EMEA and, in 1988, published a landmark paper that introduced the concept of a “business data warehouse.”
A term that had never existed before became the industry standard within a few years.
Nearly forty years later, the problem hasn’t vanished—it has simply scaled. Instead of isolated mainframes, modern companies now juggle CRMs, ERPs, marketing platforms, production systems, and dozens of Excel files that have “temporarily” become their primary analytical tool. Data is everywhere, yet answering a simple analytical question can still take the finance department three days of manual labor.
Below, we break down how a modern data warehouse is structured in Ukraine: when it becomes a business necessity, and what that journey looks like in practice.
What is a Data Warehouse and How Does It Differ from a Database?
Every company stores data somewhere, but the fundamental question is what that data is actually used for. Systems like CRM or ERP are built for a very simple task: write data quickly and retrieve it just as fast. For instance, a manager opens a client profile and gets the information in a split second. A deal goes through, and the system records it.
Everything seems ideal… right up until a higher-level question arises: “What has been happening to our margin by product category over the last two years, broken down by region and sales channel?”
Operational databases are not designed for this. They will either freeze up, slow down performance for other users, or spit out raw data that still needs to be reconciled manually.
A Data Warehouse (DWH) is a dedicated analytical environment within a larger data architecture. It ingests data from various systems, cleanses it, and transforms it into a unified structure for subsequent analysis.
DWH vs. OLTP vs. Data Lake
These three concepts are frequently confused:
When a Business Needs a Data Warehouse
There is no universal revenue threshold or specific number of systems that automatically triggers the need for a DWH. However, there are distinct symptoms:
- The financial report takes more than two days to prepare and requires manual data consolidation from several systems.
- Different departments come to the meeting with different figures for the same metric.
- An analyst spends a significant portion of their time (often more than half) collecting data.
- Any non-standard request from management turns into a separate project lasting several days.
- The company is scaling up, and the complexity of the data is growing faster than the infrastructure.
Modern Data Warehouse Architecture
Layers: Staging → Core → Data Mart
A modern analytical warehouse is constructed in layers, with each tier having a distinct role and a clear scope of responsibility.
-
Staging (Bronze layer): The first level, where data “lands” exactly as it came from the source systems—without transformations, cleansing, or business logic. Think of it as an insurance policy: if something goes wrong during subsequent stages, you can always return to the raw originals and rerun the entire process.
-
Core (Silver layer): This is where the heavy lifting happens. Data is cleared of duplicates and technical artifacts, standardized into unified formats, enriched with reference data, and interconnected. A customer record from the CRM and that same customer from the ERP are merged into a single profile with a unique identifier. Product items from different systems are given a standardized name. This layer serves as the company’s ultimate “single source of truth.”
-
Data Mart (Gold layer): Subject-oriented data marts optimized for specific analytical tasks. Examples include a financial data mart for the CFO, a marketing mart for the CMO, or an operational mart for production teams. Queries here execute rapidly because the data has already been aggregated to answer specific business questions.
This three-tier architecture delivers two major advantages simultaneously: stability (changes in source systems do not break the downstream analytics) and flexibility (new data marts can be added without dismantling or rebuilding the underlying foundation).
ETL vs. ELT Approaches
-
ETL (Extract → Transform → Load): A classic approach where data is first extracted from sources, transformed on a separate server, and only then loaded into the warehouse. It worked well in the era of expensive cloud computing and small data volumes.
-
ELT (Extract → Load → Transform): A modern approach where data is first loaded into the warehouse in its raw form, and transformations take place directly inside it. This became possible thanks to cloud platforms that possess sufficient computing power to handle transformations directly at the warehouse level.
For most new projects today, ELT is the preferred choice. Cloud platforms often make it easier to implement and scale, but the cost depends on query efficiency and processing volume. Classic ETL processes remain relevant where legacy infrastructure or strict requirements dictate what data can leave the corporate network perimeter.
Top Cloud DWH Platforms
Choosing a platform is one of the key decisions that is difficult to reverse later. A clear pool of leaders has already formed in the cloud DWH market: BigQuery and Snowflake are most often considered as baseline options in Ukraine, alongside Redshift and Azure Synapse.
Google BigQuery
BigQuery is a serverless platform from Google where there is no need to manage clusters or plan scaling: resources are allocated automatically for a specific query. It is built on its proprietary Dremel engine, which executes parallel columnar queries across thousands of nodes simultaneously.
-
Advantages: Minimal operational overhead, native integration with the Google ecosystem (Looker Studio, Vertex AI, Google Analytics), and a pay-per-use model for actually processed data. Convenient for fluctuating workloads.
-
Limitations: The on-demand model can result in unpredictable bills with unoptimized queries. Requires discipline in writing SQL. Vendor lock-in to Google Cloud.
-
Suitable for: Companies with uneven workloads, projects within Google Cloud, and teams without a dedicated infrastructure administrator.
Snowflake
Snowflake is built on the idea of complete separation of storage and compute, allowing them to scale independently of each other. The platform deploys on top of AWS, Azure, or Google Cloud, making it a multi-cloud solution with no lock-in to a specific provider.
-
Advantages: Multi-cloud architecture, convenient data sharing between organizations via the Snowflake Marketplace, and a managed cost model through virtual warehouses with flexible scaling.
-
Limitations: Higher baseline cost compared to BigQuery for small data volumes. Requires planning the size of virtual warehouses to match the workload.
-
Suitable for: Large corporations, multi-cloud environments, and companies that need to share data between business units.
Amazon Redshift
Redshift is a solution from AWS that is deeply integrated with the Amazon ecosystem: S3, Glue, SageMaker, QuickSight. It exists in two variants: classic provisioned and the newer Serverless with automatic scaling.
-
Advantages: Mature tooling for ETL processes via AWS Glue, a broad community, and a large number of integrations within the AWS ecosystem.
-
Limitations: The classic variant requires cluster planning and manual management when workloads change. Serverless addresses part of these limitations.
-
Suitable for: Companies whose infrastructure is already on AWS, and projects with a stable and predictable workload.
Microsoft Azure Synapse Analytics
Azure Synapse combines DWH, data integration, and big data analytics into a single platform. For companies in a Microsoft environment, this means native operations with Power BI, Azure Active Directory, and Microsoft Fabric without additional integration layers.
-
Advantages: A single environment for SQL queries and Spark compute, deep integration with Power BI, and a familiar interface for teams within the Azure ecosystem.
-
Limitations: Broader functionality simultaneously means greater configuration complexity. For small projects, it can be an overkill solution.
-
Suitable for: Enterprises in a corporate Microsoft environment, and projects where Power BI is the primary visualization tool.
IWIS Case Study: Data Warehouse for BAT
British American Tobacco is a transnational company with a presence in over 180 countries and a staff of over 50,000 employees. The scale of the business means a scale of data: dozens of sources, thousands of metrics, and several analytical teams with different needs.
Problem Statement
In the first phase, the team conducted an audit of existing Power BI reports and identified issues in calculations and the use of platform tools. Prior to collaborating with IWIS, BAT’s analytical reporting operated unstably: reports were occasionally unavailable, metrics were recalculated with delays, and system performance drew complaints. The diagnostics revealed a systemic problem: data was transported and transformed without intermediate storage. In fact, every request to Power BI triggered a full cycle of data extraction and transformation from the sources in real time. Any failure at any step meant the report simply would not open. A separate issue was the lack of an orchestration layer for ETL processes: no one monitored the connections to sources, the completion of loads, or the quality of the output data. The system operated as a black box.
Architectural Solution
The IWIS team developed several architecture options, each of which involved building a business-level data warehouse, but using different tools. The client chose the optimal platform taking into account their corporate infrastructure. The implementation included two levels:
-
Bronze layer (raw data): Extraction mechanisms with automatic saving of raw data were configured for each source. Data is no longer transformed on the fly; it first lands in an unchanged state.
-
Silver layer (cleaned data): A separate transformation process cleans and structures data from the Bronze level. Power BI no longer pulls data directly from sources; it works with the already prepared layer.
Separately, an infological data model was built: a description of all key entities, their relationships, and flows between systems. This simplified subsequent maintenance and became the foundation for scaling. The most critical component of the architecture is the orchestration mechanism. The system controls the execution of each ETL task according to a schedule, compares data increments between the Bronze level and the source in a parallel manner (independent of the extraction process itself), and automatically sends notifications to email and Telegram if discrepancies or failures are detected.
Results
The IWIS and BAT team went from unstable reporting to a fully functional two-tier warehouse architecture in 12 months. The specific results:
- Power BI is no longer directly dependent on source systems.
- Data updates have become more predictable thanks to an incremental approach.
- Any ETL failure is now automatically logged.
- The infological model provided the BAT team with a documented foundation for independent system development.
The Gold tier (Data Mart) had not been implemented by the time the project was completed, as the Silver tier met the existing requirements. However, the architecture is designed in such a way that adding the Gold tier does not require a redesign of the previous layers.
How Much Does DWH Development Cost in Ukraine?
When it comes to data warehouse development, the question of cost is always one of the first to arise. And the answer is always the same: it depends. However, there are benchmarks that help form a realistic budget.
The development cost consists of several components:
-
Analysis and Architecture Design: The Discovery phase, where the team studies data sources, their quality, structure, and interrelations, builds an infological model, and designs the technical solution. Skipping this phase to write code immediately is the most common mistake, and it proves costly during rework.
-
Development and Configuration: Building the warehouse layers, setting up ETL/ELT processes, orchestration, and data quality control. This constitutes the largest portion of the budget.
-
Platform Licenses: These depend on the chosen solution. Cloud platforms (BigQuery, Snowflake, Redshift, Azure Synapse) operate on a pay-as-you-go model, meaning initial costs are small, but they grow alongside data volume and the number of queries.
-
Support and Evolution: After launch, the system requires monitoring, updates, and the addition of new sources or data marts.
The estimates are based on Ukrainian market practices and global benchmarks, as most DWH projects are implemented using international cloud platforms. Here are approximate cost guidelines:
These figures serve as a starting point. The actual cost is determined after the Discovery phase, once the quality and structure of the source data are clear. This clarity can either reduce the budget or increase it several times over. This is where surprises most frequently occur: a legacy ERP system with decade-old documentation, or data formats that no one warned about.
Mistakes in Building a Data Warehouse
Most problems in DWH projects are organizational or architectural. Here are the most common ones:
-
Skipping the Discovery Phase: The team begins development without a full understanding of data sources, their quality, and business logic. The result is late-stage rework that costs multiple times more than the initial analysis.
-
Lack of a Data Owner: A DWH is an infrastructure that requires constant attention. If a company lacks a dedicated person responsible for data quality and relevance, the system gradually degrades.
-
Attempting to Build Everything at Once: The desire to create the “perfect warehouse” on the first try leads to lengthy projects with no intermediate results. The correct approach is iterative: launch a basic architecture, realize value, and then evolve.
-
Ignoring Input Data Quality: “Garbage in, garbage out” is a literal description of what happens when source data fails validation. Analytics based on dirty data is harmful because it creates a false confidence in numbers.
-
Choosing a Platform Before Understanding Needs: The platform should be determined by the architecture and needs, not the other way around.
Where to Start: A Business Readiness Checklist
Before starting DWH development, it’s important to honestly answer a few questions:
- Do you have a clear understanding of the business challenges the data warehouse needs to address?
- Is there an inventory of data sources: a list of systems, formats, and responsible parties?
- Is there anyone from the business side who is willing to participate in Discovery and validate the logic behind the metrics?
- Is there an understanding of the quality of the current data? Are we aware of where there are duplicates, gaps, or inconsistencies?
- Is there a budget not only for development but also for ongoing support?
- Is there a technical team or contractor that will be responsible for the system after it goes live?
If the answer to most of these questions is “no,” you should start with a data audit and requirements gathering. This is the first step in a well-structured data warehouse project.
Free Data Architecture Consultation
IWIS specializes in building analytical infrastructure for businesses: from data source audits to full-fledged data warehouses with ETL processes and BI reporting. Our clients include many companies of various sizes that have made the transition from fragmented data to a single source of truth.
If you recognize your situation in this article, contact us for a free data architecture consultation. We’ll analyze your infrastructure and suggest specific next steps.
Interesting materials for you