Data Profiling Components
To understand data profiling, you need to be familiar with the following components:
- PowerCenter Client.
- PowerCenter Data Profile.
- Data Profiling warehouse.
- Data Profiling reports.
PowerCenter Client. Use the PowerCenter Client to create and manage data profiles.
PowerCenter Data Profile. Metadata that you generate in the PowerCenter Client that defines what types of statistics you want to collect for a source. It is comprised of a source definition, a profile mapping, and a profile session.
Data Profiling warehouse. The Data Profiling warehouse stores results from profile sessions and reports that you run to view the results.
Data Profiling reports. View data and metadata in Data Profiling reports.
Use the following PowerCenter Client tools to create and manage data profiles:
- Profile Manager.
Designer. Create data profiles from the Source Analyzer or the Mapplet Designer. When you create a data profile, the Designer generates a profile mapping based on the profile functions. The PowerCenter repository stores the profile mappings and metadata. If the repository is versioned, profile mappings are versioned in the same way other PowerCenter mappings are versioned.
Profile Manager. A tool in the PowerCenter Designer that you use to manage data profiles. You can edit and regenerate profiles, run profile sessions, and view profile results.
PowerCenter Data Profile
A data profile contains the source definitions, the functions and function parameters, and the profile session run parameters. To create a data profile, you run the Profile Wizard from the PowerCenter Designer. When you create a data profile, you create the following repository objects:
- Profile mapping.
- Profile session.
Profile. A profile is a repository object that represents all the metadata configured in the wizard. You create the profile based on a mapplet or source definition and a set of functions.
Profile mapping. When you create a data profile, the Profile Wizard generates a profile mapping. Select functions in the wizard that to help determine the content, structure, and quality of the profile source. You can use pre-defined or custom functions. The Profile Wizard creates transformations and adds targets based on the functions that you supply. You can view the profile mapping in the Mapping Designer.
Profile session. After the Profile Wizard generates a profile mapping, you provide basic session information such as Integration Service name and connection information to the source and the Data Profiling warehouse. The Profiling Wizard creates a profile session and a profile workflow. You can choose to run the profile session when the wizard completes, or you can run it later. When you run a profile session, the Integration Service writes profile results to the Data Profiling warehouse.
While profiles are not versioned, the profile mappings and profile sessions are versioned objects.
Data Profiling Warehouse
The Data Profiling warehouse is a set of tables that stores the results from profile sessions. It also contains reports that you run to view the profile session results. You can create a Data Profiling warehouse on any relational database that PowerCenter supports as a source or target database. Create a Data Profiling warehouse for each PowerCenter repository you want to store data profiles in.
Data Profiling Reports
You can view the results of each function configured in the data profile. Based on the type of metadata you want to view, you can view reports from the following tools:
- Profile Manager.
- Data Analyzer.
Profile Manager. PowerCenter Data Profiling reports provide information about the latest session run. View them from the Profile Manager.
Data Analyzer. Data Analyzer Data Profiling reports provide composite, metadata, and summary reports. View them from the Data Profiling dashboard in Data Analyzer. You can also customize the reports in Data Analyzer.