Data Mart Pros and Cons

Data Mart Pros and Cons

Independent Data Marts

Independent data marts are usually the easiest and fastest to implement and their payback value can be almost immediate. Some corporations start with several data marts before deciding to build a true data warehouse. This approach has several inherent problems:

  • While data marts have obvious value, they are not a true enterprise-wide solution and can become very costly over time as more and more are added.
  • A major problem with proliferating data marts is that, depending on where you look for answers, there is often more than one version of the truth.
  • They do not provide the historical depth of a true data warehouse.
  • Because data marts are designed to handle specific types of queries from a specific type of user, they are often not good at “what if” queries like a data warehouse would be.

Logical Data Marts

Logical data marts overcome most of the limitations of independent data marts. They provide a single version of the truth. There is no historical limit to the data and “what if” querying is entirely feasible. The major drawback to logical data marts is the lack of physical control over the data. Because data in the warehouse in not pre-aggregated or dimensionalized, performance against the logical mart will not usually be as good as against an independent mart. However, use of parallelism in the logical mart can overcome some of the limitations of the non-transformed data.

Dependent Data Marts

Dependent data marts provide all advantages of a logical mart and also allow for physical control of the data as it is extracted from the data warehouse. Because dependent marts use the warehouse as their foundation, they are generally considered a better solution than independent marts, but they take longer and are more expensive to implement.

 

 

Advantages of Using Summary Data

Advantages of Using Summary Data

Until recently, most business decisions were based on summary data. The problem is that summarized data is not as useful as detail data and cannot answer some questions with accuracy. With summarized data, peaks and valleys are leveled when the peaks fall at the end of reporting period and are cut in half.

Here’s another example. Think of your monthly bank statement that records checking account activity. If it only told you the total amount of deposits and withdrawals, would you be able to tell if a certain check had cleared? To answer that question you need a list of every check received by your bank. You need detail data.

Decision support-answering business questions-is the real purpose of databases. To answer business questions, decision-makers must have four things:

  • The right data
  • Enough detail data
  • Proper data structure
  • Enough computer power to access and produce reports on the data

 

Key Features of Teradata

Key Features of Teradata

The following are the key features of Teradata database,

  • Single data store
  • Scalability
  • Unconditional parallelism (parallel architecture)
  • Ability to model the business
  • Mature, parallel-aware Optimizer

Single Data Store

The Teradata Database acts as a single data store, with multiple client applications making inquiries against it concurrently.

Instead of replicating a database for different purposes, with the Teradata Database you store the data once and use it for many applications. The Teradata Database provides the same connectivity for an entry-level system as it does for a massive enterprise data warehouse.

Scalability

“Linear scalability” means that as you add components to the system, the performance increase is linear. Adding components allows the system to accommodate increased workload without decreased throughput. Linear scalability enables the system to grow to support more users/data/queries/complexity of queries without experiencing performance degradation. As the configuration grows, performance increase is linear, slope of 1. The Teradata Database was the first commercial database system to scale to and support a trillion bytes of data.

The chart below lists the meaning of the prefixes:

Prefix

Exponent

Meaning

kilo-

103

1,000 (thousand)
mega-

106

1,000,000 (million)
giga-

109

1,000,000,000 (billion)
tera-

1012

1,000,000,000,000 (trillion)
peta-

1015

1,000,000,000,000,000 (quadrillion)
exa-

1018

1,000,000,000,000,000,000 (quintillion)

The Teradata Database can scale from 100 gigabytes to over 100+ petabytes of data on a single system without losing any performance capability. The Teradata Database’s scalability provides investment protection for customer’s growth and application development. The Teradata Database is the only database that is predictably scalable in multiple dimensions, and this extends to data loading with the use of parallel loading utilities. The Teradata Database provides automatic data distribution and no reorganizations of data are needed. The Teradata Database is scalable in multiple ways, including hardware, query complexity, and number of concurrent users.

Hardware

Growth is a fundamental goal of business. An MPP system easily accommodates that growth whenever it happens. The Teradata Database runs on highly optimized Teradata servers in the following configurations:

  • SMP – Symmetric multiprocessing platforms manage terabytes of data to support an entry-level data warehousing system.
  • MPP – Massively parallel processing systems can manage hundreds of petabytes of data. You can start with a couple of nodes, and later expand the system as your business grows.

With the Teradata Database, you can increase the size of your system without replacing:

  • Databases – When you expand your system, the data is automatically redistributed through the reconfiguration process, without manual interventions such as sorting, unloading and reloading, or partitioning.
  • Platforms – The modular structure allows you to add components to your existing system.
  • Data model – The physical and logical data models remain the same regardless of data volume.
  • Applications – Applications you develop for Teradata Database configurations will continue to work as the system grows, protecting your investment in application development.

Complexity

The Teradata Database is adept at complex data models that satisfy the information needs throughout an enterprise. The Teradata Database efficiently processes increasingly sophisticated business questions as users realize the value of the answers they are getting. It has the ability to perform large aggregations during query run time and can perform up to 128 joins in a single query.

Concurrent Users

As is proven in every Teradata Database benchmark, the Teradata Database can handle the most concurrent users, who are often running multiple, complex queries. The Teradata Database has the proven ability to handle from hundreds to thousands of users on the system simultaneously. Adding many concurrent users typically reduces system performance. However, adding more components can enable the system to accommodate the new users with equal or even better performance.

Unconditional Parallelism

The Teradata Database provides exceptional performance using parallelism to achieve a single answer faster than a non-parallel system. Parallelism uses multiple processors working together to accomplish a task quickly.

An example of parallelism can be seen at an amusement park, as guests stand in line for an attraction such as a roller coaster. As the line approaches the boarding platform, it typically will split into multiple, parallel lines. That way, groups of people can step into their seats simultaneously. The line moves faster than if the guests step onto the attraction one at a time. At the biggest amusement parks, the parallel loading of the rides becomes essential to their successful operation.

Parallelism is evident throughout a Teradata Database, from the architecture to data loading to complex request processing. The Teradata Database processes requests in parallel without mandatory query tuning. The Teradata Database’s parallelism does not depend on limited data quantity, column range constraints, or specialized data models — The Teradata Database provides “unconditional parallelism meaning that there are no serial bottlenecks.”

Teradata supports ad-hoc queries using ANSI-standard SQL which allows Teradata to interface with 3rd party Business Intelligence (BI) tools and submit queries from other database systems.

Ability to Model the Business

A data warehouse built on a contains information from across the enterprise. Individual departments can use their own assumptions and views of the data for analysis, yet these varying perspectives have a common basis for a “single view of the business.”

With the Teradata Database’s centrally located, logical architecture, companies can get a cohesive view of their operations across functional areas to:

  • Find out which divisions share customers.
  • Track products throughout the supply chain, from initial manufacture, to inventory, to sale, to delivery, to maintenance, to customer satisfaction.
  • Analyze relationships between results of different departments.
  • Determine if a customer on the phone has used the company’s website.
  • Vary levels of service based on a customer’s profitability.

You get consistent answers from the different viewpoints above using a single business model, rather than functional models for different departments. In a functional model, data is organized according to what is done with it. But what happens if users later want to do some analysis that has never been done before? When a system is optimized for one department’s function, the other departments’ needs (and future needs) may not be met.

A Teradata Database models a customer’s business with data organized according to what it represents, not how it is accessed, so it is easy to understand. The data model should be designed without regard to usage and be the same regardless of data volume. With a Teradata Database as the enterprise data warehouse, users can ask new questions of the data that were never anticipated, throughout the business cycle and even through changes in the business environment.

A key Teradata Database strength is its ability to model the customer’s business. The Teradata Database supports business models that are truly normalized, avoiding the costly star schema and snowflake implementations that many other database vendors use. The Teradata Database can support star schema and other types of relational modeling, but Third Normal Form is the method for relational modeling that we recommend to customers. Our competitors typically implement star schema or snowflake models either because they are implementing a set of known queries in a transaction processing environment, or because their architecture limits them to that type of model. Normalization is the process of reducing a complex data structure into a simple, stable one. Generally this process involves removing redundant attributes, keys, and relationships from the conceptual data model. The Teradata Database supports normalized logical models because it is able to perform 128 table joins and large aggregations during queries.

 

Mature, Parallel-Aware Optimizer

The Teradata Database Optimizer is the most robust in the industry, able to handle:

  • Multiple complex queries
  • Multiple joins per query
  • Unlimited ad-hoc processing

The Optimizer is parallel-aware, meaning that it has knowledge of system components (how many nodes, vprocs, etc.). It determines the least expensive plan (time-wise) to process queries fast and in parallel. The Optimizer is further explained in the next module.

 

How Is the Teradata Database Used in Data Warehouse

How Is the Teradata Database Used in Data Warehouse

Each Teradata Database implementation can model a company’s business. The ability to keep up with rapid changes in today’s business environment makes the Teradata Database an ideal foundation for many applications, including:

Enterprise Data Warehouse

Data warehousing is a process for properly assembling and managing data from various servers to answer business-critical questions. The Teradata Database is ideal for enterprise data warehousing, which is commonly characterized by:

  • Multiple subject areas
  • Many concurrent users
  • Many concurrent queries, including ad-hoc queries
  • Large quantity of tables
  • Hundreds of gigabytes (and terabytes) of detail data
  • Historical data stored (months or years)

Without an enterprise data warehouse, a financial institution may be able to identify profitable customers for separate products such as mortgages or credit cards, but not know the overall profitability of each customer. An enterprise data warehouse brings together the different subject areas into a central repository, creating “one single view of the business ” for a complete picture of the customer.

An enterprise data warehouse environment built on the Teradata Database simplifies the system maintenance task, resulting in a lower total cost of ownership. In addition, the Teradata Database’s ability to handle large-scale, decision-support queries against huge volumes of detail data makes it the obvious choice for companies wanting to start at any level and grow.

Active Data Warehouse

The active data warehouse extends a company’s ability beyond historical data and strategic decisions to bring the decision-making capability to front-line personnel. The tactical decisions such as, “Who should get the empty seat on this airplane?” or “What should I offer this customer to keep her from leaving, based on her history with our company?” can be made more effectively with the right information.

With an active data warehouse, employees who interact directly with customers and suppliers are empowered with information-based decision making at their fingertips. The Teradata Warehouse supports active data warehousing with:

  • Capability to handle thousands of additional users and mixed workloads
  • High availability and reliability to support mission-critical applications
  • Scalability to accommodate an increase in the amount of data, the number of data sources, and the number of applications supported in the data warehouse environment

Customer Relationship Management

Customer Relationship Management solutions help companies capture and analyze data to maximize customer acquisition, retention, and profitability. You can use the Teradata Database’s detailed data and analysis capabilities to identify and optimize business relationships with the highest potential of profitability and growth. Examples include:

  • A telephone company can conduct and refine marketing programs targeted at a certain type of profitable customer.
  • A supermarket can create incentives based on specific combinations of products that customers tend to buy together.
  • A bank can recognize changes in a customer’s life circumstances, such as a new baby or a college-bound son or daughter, and offer timely services such as a new home loan, mortgage insurance, additional checking account, extra credit card, or student loan.
  • A retailer can run a department store credit card sales program and filter out those customers who already have that card.

Teradata’s CRM solution, Teradata Relationship Manager, consists of software, professional and customer services, and the Teradata Database to create, maintain, and enhance customer relationships.

Internet and E-Business

The Teradata Database provides a single repository for customer information that helps E-Businesses build and maintain one-to-one customer relationships that are critical to their success on the Internet. The Teradata Database supports the fast-paced style of E-Business by allowing many concurrent users to ask complicated questions as they think of them — and get quick answers.

The Teradata Database allows E-Businesses to:

  • Capture massive amounts of click-stream data.
  • Enable multiple users to ask complex questions of the customer’ click-stream data with near real-time response.
  • Protect customers’ privacy with consumer opt-in/opt-out preferences and ability for consumers to check and revise their information stored on the Teradata Database through the Internet or a company call center.

Data Mart in Data warehouse

A data mart is a special purpose subset of a company’s enterprise data used by a particular department, function, or application. Often, these single-subject area data marts contain data that was aggregated or transformed in some way to better handle the requests of a specific user community. Vendors implement data marts using different architectures:

  • Independent data marts – Created directly from operational systems to an individual data store.
  • Dependent data marts – Created from detail data in the data warehouse. It still requires movement and transformation of data, but may provide better performance for some specific user queries.
  • Logical data marts – Existing parts of the data warehouse, not separate physical structures. Because in theory the data warehouse contains the detail data of the entire enterprise, a logical data mart would then provide the specific information for a specific user community. With the proper technology, this can be an ideal way to remove the need for massive data loading and transforming.

Independent and dependent data marts are architectures endorsed by other database vendors and tend to be associated with higher maintenance costs for physically moving and maintaining the data, inconsistent data (and resulting inconsistent decisions), and indirect ways to get the complete picture of the data. The Teradata Database is ideal for the logical data mart environment, where different user communities view subsets of a single repository of enterprise data.

 

 

Internet and E-Business

Internet and E-Business

The Teradata Database provides a single repository for customer information that helps E-Businesses build and maintain one-to-one customer relationships that are critical to their success on the Internet. The Teradata Database supports the fast-paced style of E-Business by allowing many concurrent users to ask complicated questions as they think of them — and get quick answers.

The Teradata Database allows E-Businesses to:

  • Capture massive amounts of click-stream data.
  • Enable multiple users to ask complex questions of the customer’ click-stream data with near real-time response.
  • Protect customers’ privacy with consumer opt-in/opt-out preferences and ability for consumers to check and revise their information stored on the Teradata Database through the Internet or a company call center.

Enterprise Data Warehouse

Enterprise Data Warehouse

Data warehousing is a process for properly assembling and managing data from various servers to answer business-critical questions. The Teradata Database is ideal for enterprise data warehousing, which is commonly characterized by:

  • Multiple subject areas
  • Many concurrent users
  • Many concurrent queries, including ad-hoc queries
  • Large quantity of tables
  • Hundreds of gigabytes (and terabytes) of detail data
  • Historical data stored (months or years)

Without an enterprise data warehouse, a financial institution may be able to identify profitable customers for separate products such as mortgages or credit cards, but not know the overall profitability of each customer. An enterprise data warehouse brings together the different subject areas into a central repository, creating “one single view of the business ” for a complete picture of the customer.

An enterprise data warehouse environment built on the Teradata Database simplifies the system maintenance task, resulting in a lower total cost of ownership. In addition, the Teradata Database’s ability to handle large-scale, decision-support queries against huge volumes of detail data makes it the obvious choice for companies wanting to start at any level and grow.

Active Data Warehouse

Active Data Warehouse

 

The active data warehouse extends a company’s ability beyond historical data and strategic decisions to bring the decision-making capability to front-line personnel. The tactical decisions such as, “Who should get the empty seat on this airplane?” or “What should I offer this customer to keep her from leaving, based on her history with our company?” can be made more effectively with the right information.

With an active data warehouse, employees who interact directly with customers and suppliers are empowered with information-based decision making at their fingertips. The Teradata Warehouse supports active data warehousing with:

  • Capability to handle thousands of additional users and mixed workloads
  • High availability and reliability to support mission-critical applications
  • Scalability to accommodate an increase in the amount of data, the number of data sources, and the number of applications supported in the data warehouse environment