Data Warehouse Concepts
- Online Transaction Processing.
- Its fully transactional based
- Transaction should be made as soon as possible
- It’s a client server model
- It handles with current data.
- The SQL statements retrieves only small amount of data.
- Data’s are in the form on “Normalized data”
- Mostly used SQL statements are INSERT,UPDATE,DELETE
- Heterogeneous data
- Examples: Online Reservation System, Withdraw money from ATM and etc.,
- Online Analytical Processing
- Otherwise called as “Decision Support System”
- Deformalized data
- Mainly focuses on database
- Its fully analysis and report oriented
- Main users are Management and business people
- It handles Historical Data
- Only Bulk Load
- Homogeneous data
Difference between OLTP and OLAP
|Online Transaction Processing||Online Analytical Processing|
|OLTPs are the original source of the data.||OLAP data comes from the various OLTP Databases|
|It is transaction oriented.||It is Analytical and report oriented.|
|Normalized data.||De normalized data.|
|Handles current data||Handles historical data|
|Operations like insert, update, delete can be performed.||Since it is mainly used for bulk loading of data we mainly perform the insert operation.Update and delete operations are performed in some cases where it is necessary.|
|Select query is used to retrieve only minimal amount of data.||Select query is used to retrieve bulk volume of data.|
|Used by large number of end users.||Used by minimal number of users.|
|Consist of heterogeneous data.||Consist of homogenous data.|
Tasks Performed When data coming from Live to data warehouse
- DE normalizing the data.
- Data cleansing.
- Data trimming
- Removing duplicate records.
- Decoding the data.
- Calculating the data.
- Data validation if required
- Null Value replacement.
- Data type conversion.
Life Cycle of Data Warehousing Project:
- Project Planning
- Budget Allocation.
- Internal resource allocation
- BRS (Business Requirement Study) which involves
- Dimensional modeling
- This involves conversion of OLTP tables to OLAP tables.
- This is performed by Dimension modeler.
- Physical Design
- Application Development
- Deployment and training
Teams involved in Data warehousing:
- Data Dimension modeler.
- Report Developer
- Report Admin
- ETL developer
- ETL admin
- System Admin.
- Managing Historical data and helps the management to take right decisions at right time
- A DW is Subject oriented, nonvolatile and integrated collection of data to support management
- Data mart is a subset of data warehouse
DWH Design Approach
- Top Down Approach
- Bottom Up Approach
- De normalized data
- Must have one primary key
- Contains less number of records than fact table
- It contains large number of columns
- Contains foreign key to all the dimension table
- Contains large number of data than the dimension table
Dimension model type
- Star Schema
- Snow flake schema
- Fact table is surrounded by many dimension table