Business Intelligence is the activity which contributes to the growth of any company. BI is the act of transforming raw/operational data into useful information for business analysis.
How Does Business Intelligence Work?
There is a certain format on which everything works and same applies to Business Intelligence. Wanna know them? Here they are…
- BI based on Data Warehouse technology extracts information from a company’s operational system.
- The data is transformed (cleaned and integrated) and loaded into the data warehouse.
- Since this data is credible.It is used for business insights.
Why Data Warehousing?
Data Collected From various sources and stored in various databases cannot be directly visualized. The data first needs to be integrated and then processed before visualization takes place.
Data Warehousing: In Brief
A central location where consolidated data from multiple locations(databases) are stored. Data Warehousing is maintained separately from an organization’s operational database. End users access it whenever any information is needed.
Note: Data Warehouse is not loaded every time new data is added to the database.
- The strategic question can be answered by studying trends.
- Data Warehousing is faster and more accurate.
Note: Data Warehouse is not a product that a company can go and purchase, it needs to be designed and depends entirely on the company’s requirement.
“A data warehouse is a subject -oriented, an integrated, time-variant and non-volatile collection of data in support of managements’ decision-making process”-Bill Immon Father of data warehousing.
Subject-oriented: Subject-oriented usually contains categorized data and stored by business subject rather than by its application.
Integrated: Integrated data on a given subject is collected from different sources of data collection and stored in a single place.
Time-variant: Organized data is stored as a service of snapshots, each one of data representing a period of time.
Non-volatile: Typically data in the data warehouse is not updated or deleted.
Key Terminologies related to DWH Architecture
OLTP Vs OLAP
Information Technology systems, that can be divided into two parts one is transactional (OLTP) and another is analytical (OLAP). Generally, we can imagine that OLTP systems provide source data to a different stage of data warehouses, whereas OLAP systems help to analyze it.
On-line Transaction Processing (OLTP)
OLTP is defined as a large number of short online records (INSERT, UPDATE, DELETE). On-line Transaction processing system provides fast data modification, data processing, data duplicacy in multi-access of time and an effectiveness measured by a number of records per second.
In this database, there is detailed and current data, and structure used to store transnational databases as entity model (usually 3NF).
Example: All bank transactions made by a customer.
Below are few pointers for its deeper understanding:
- Contains current data.
- Useful in running the business.
- Based on entity-relationship model.
- Provides primitive and highly detailed data.
- Used for writing data into the database.
- Database size ranges from 100MB to 1GB.
- Fast provides, high performance.
- A number of records accessed is in tens.
On-line Analytical Processing (OLAP)
This term is defined by the relatively low volume of transactions. Queries are generally complex and involve aggregation. In this type of OLAP systems, a response time is an effectiveness measure, applications are widely used by Data Mining techniques.
In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
- Contains historical data.
- Useful in analyzing the business.
- Based on star, snowflake and the fact constellation schema.
- Provided summarized consolidated data.
- Use for reading data from the data warehouse.
- Data warehouse size ranges from 100GB to 1TB.
- Highly flexible but not fast.
- A number of records accessed is in millions.
Example: Bank transaction made by a customer at a particular time.
Process of Data Warehouse
ETL stands for Extract, Transform and Load. An ETL tool extracts the data from different R-DBMS source, transforms the data like applying calculations etc. and then load the data to Data Warehouse system., in the form of dimension and fact tables.
- To perform extraction process staging area is required during ETL load.
- The source systems are available for a particular period of time to extract data. The time period is less than the total data-load time. Such that, staging area allows to extract the data from the source of the system and keeps it in the staging area before the time slot ends.
- The staging area is required to get the data from multiple data sources together or to join two or more systems together.
- Data extractions’ time slot for different systems varies as per the time zone and operational hours.
- Extracted data from a source of systems can be used in multiple data warehouse system, Operation Data stores, etc.
- ETL allows to perform complex transformations and requires the extra area to store the data.
In data transformation, a set of functions are applied on extracted data to load it into the target system. A single unit of data, which does not require any transformation in organized data is known as direct move or pass through data.
During Load phase, data is loaded into the end-target system or destination system and it can be a flat file or a Data Warehouse system.
Data Mart is a similar version of data warehouse, which deal with a single subject. It focuses only one area. Time taken to build data mart is very less compare to build a data warehouse. Data marts are confined to subjects.
Key Points About Data Marts
- Windows-based or Unix/Linux-based servers are used to implement data marts, implemented on low-cost servers.
- The life cycle for data marts may be complex in the long run if their planning and design are not organization-wide.
- Data marts are small in size.
- Data marts are customized by the department.
- Data marts are flexible.
Metadata is the summarized data that leads us to the detailed data. Simply defined as data about data. The data that are used to represent other data is known as metadata. E.g, the index of a book which serves as a metadata for the contents of the book.
The structure of data cube is defined as OLAP to achieves the multidimensional functionality. A data cube is used to represent data along with some measure of interest. It allows looking complex data in a very simple way.
Example: Suppose a company wants to keep track of sales records with the help of sales data warehouse with respect to time, item, branch, and location. These dimensions allow to keep track of monthly sales and at which branch the items were sold. There is a table associated with each dimension. This table is known as a dimension table. Let’s take an example, A single dimension of any data table may have attributes such as item_name, item_type, and item_brand.
The following table represents the 2-D view of Sales Data for a company with respect to time, item, and location dimensions.
Thereafter, in the 2-D table, we have multiple records with time-bound and items only. The sales for New Delhi are shown with respect to time, and item dimensions according to the type of items sold.
Whenever, If we need to view the complete sales data with one more dimension, let’s say, the location dimension, then the 3-D view would be useful. The 3-D view of the sales data with respect to time, item, and location is shown in the table below −
In above 3D table, That can be represented as 3D data cube as shown below −
Three-Tier Data Warehouse Architecture
Generally, a data warehouse adopts a three-tier architecture. Below are the three tiers of the data warehouse architecture.
1. Bottom Tier
This tier constitutes data warehouse, data marts, metadata, monitoring and administration.This tier is a warehouse database server that is almost always a relational database system.Data is fed to this tier from operational databases and external source using back-end tools and utilities.These tools and utilities first perform extract, transform, load and refresh functions on the data.
2. Middle Tier
In OLAP Server that allows implementation in either of the following ways.
In Relational term of OLAP (ROLAP), which is an extended relational database management system. The ROLAP defines multiple operations on the multidimensional data model to analyze relational data operations.
Multidimensional (MOLAP) model that directly implements the multidimensional data and operations.
Top tier is the last layer of data warehouse structure. It is the front-end client layer. this layer holds the query tools and reporting tools, analysis tools and data mining tools.
The below diagram shows the 3-tier architecture of data warehouse−
Data Warehousing is not a new phenomenon. All large organizations already have data warehouses, but they are just not managing them. In the coming years, the demand for data warehousing is going to become enormous with new products, Services, and technologies coming out individually.
In order to get the most out of this period, it is going to be important that data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with the performance today and flexibility for tomorrow.