Data Profiling Process

Data Profiling Process

After you create a data profile, you can run the profile session and view the results in a report.

The following steps describe the data profiling process:

  • Create a data profile.
  • Run the profile session.
  • View the reports.

1. Create a data profile. Use the Profile Wizard in the Designer to create a data profile based on a source definition and a set of functions. The Profile Wizard generates a mapping and a session based on criteria that you provide.

2. Run the profile session. You can choose to run the profile session when you finish the Profile Wizard, or you can run it from the Profile Manager. The Integration Service runs the session and loads the profile results to the Data Profiling warehouse.

3. View the reports. View the Data Profiling report associated with the profile session. Based on the type of profile report, you can view reports from the Profile Manager or from Data Analyzer.

 

Advertisements

Data Profiling Connectivity

Data Profiling Connectivity

 

Power Center Data Profiling uses the following types of connectivity:

  • TCP/IP.
  • Native.
  • ODBC.
  • JDBC.

TCP/IP. The PowerCenter Client and the Integration Service use native protocol to communicate with the Repository Service.

Native. The Integration Service uses native database connectivity to connect to the Data Profiling warehouse when it loads target data from the profiling sessions.

ODBC. The PowerCenter Client uses ODBC to connect to the Data Profiling warehouse when you run data profiling reports from the Profile Manager.

JDBC. Data Analyzer uses JDBC to connect to the Data Profiling warehouse when you run data profiling reports.

 

Data Profiling Components

Data Profiling Components

 

To understand data profiling, you need to be familiar with the following components:

  • PowerCenter Client.
  • PowerCenter Data Profile.
  • Data Profiling warehouse.
  • Data Profiling reports.

PowerCenter Client. Use the PowerCenter Client to create and manage data profiles.

PowerCenter Data Profile. Metadata that you generate in the PowerCenter Client that defines what types of statistics you want to collect for a source. It is comprised of a source definition, a profile mapping, and a profile session.

Data Profiling warehouse. The Data Profiling warehouse stores results from profile sessions and reports that you run to view the results.

Data Profiling reports. View data and metadata in Data Profiling reports.

PowerCenter Client

 

Use the following PowerCenter Client tools to create and manage data profiles:

  • Designer.
  • Profile Manager.

Designer. Create data profiles from the Source Analyzer or the Mapplet Designer. When you create a data profile, the Designer generates a profile mapping based on the profile functions. The PowerCenter repository stores the profile mappings and metadata. If the repository is versioned, profile mappings are versioned in the same way other PowerCenter mappings are versioned.

Profile Manager. A tool in the PowerCenter Designer that you use to manage data profiles. You can edit and regenerate profiles, run profile sessions, and view profile results.

 

PowerCenter Data Profile

 

A data profile contains the source definitions, the functions and function parameters, and the profile session run parameters. To create a data profile, you run the Profile Wizard from the PowerCenter Designer. When you create a data profile, you create the following repository objects:

  • Profile.
  • Profile mapping.
  • Profile session.

Profile. A profile is a repository object that represents all the metadata configured in the wizard. You create the profile based on a mapplet or source definition and a set of functions.

Profile mapping. When you create a data profile, the Profile Wizard generates a profile mapping. Select functions in the wizard that to help determine the content, structure, and quality of the profile source. You can use pre-defined or custom functions. The Profile Wizard creates transformations and adds targets based on the functions that you supply. You can view the profile mapping in the Mapping Designer.

Profile session. After the Profile Wizard generates a profile mapping, you provide basic session information such as Integration Service name and connection information to the source and the Data Profiling warehouse. The Profiling Wizard creates a profile session and a profile workflow. You can choose to run the profile session when the wizard completes, or you can run it later. When you run a profile session, the Integration Service writes profile results to the Data Profiling warehouse.

While profiles are not versioned, the profile mappings and profile sessions are versioned objects.

Data Profiling Warehouse

 

The Data Profiling warehouse is a set of tables that stores the results from profile sessions. It also contains reports that you run to view the profile session results. You can create a Data Profiling warehouse on any relational database that PowerCenter supports as a source or target database. Create a Data Profiling warehouse for each PowerCenter repository you want to store data profiles in.

 

Data Profiling Reports

 

You can view the results of each function configured in the data profile. Based on the type of metadata you want to view, you can view reports from the following tools:

  • Profile Manager.
  • Data Analyzer.

Profile Manager. PowerCenter Data Profiling reports provide information about the latest session run. View them from the Profile Manager.

Data Analyzer. Data Analyzer Data Profiling reports provide composite, metadata, and summary reports. View them from the Data Profiling dashboard in Data Analyzer. You can also customize the reports in Data Analyzer.

 

What is Data Profiling

What is Data Profiling

 

Data profiling is a technique used to analyze the content, quality, and structure of source data. Use Power Center Data Profiling to detect patterns and exceptions of source data during mapping development and during production. Use data profiling to make the following types of analyses:

  • Make initial assessments.
  • Validate business rules
  • Verify assumptions.
  • Verify report validity.

Make initial assessments. You can make initial assessments about data patterns and exceptions data during mapping development. As a result, you can design mappings and workflows on actual data, rather than make theoretical assumptions about sources.

Validate business rules. You can validate documented business rules about the source data. For example, if you have a business rule requiring columns in a source table to contain U.S. ZIP codes, you can profile the source data to verify that the rows in this table contain the proper values.

Verify assumptions. You can verify that the initial assumptions you made about source data during project development are still valid. For example, you may want to view statistics about how many rows satisfied a business rule and how many did not.

Verify report validity. You can use data profiling to verify the validity of the Business Intelligence (BI) reports.

 

Export Informatica Objects using shell script

Export Informatica Objects using shell script

Exporting Informatica Objects from Informatica repository using shell script is not a complex one. It is very easy if you understand the functionality of certain commands like pmrep connect and pmrep ObjectExport.

pmrep connect is used to connect to a Informatica repository.

pmrep ObjectExport is used to export Objects from Informatica repository.

Code to connect to Informatica Server:

pmrep connect -r <<repo-name>> -d <<domain-name>>-n <<infa_user>> -x <<infa_pwd>>

code to Export Informatica Objects :

pmrep ObjectExport -n ${soure_infa} -o workflow -m -s -b -r  -f ${folder} -u ${tgt_infa}

Informatica Operator Precedence

Informatica Operator Precedence

The transformation language supports the use of multiple operators and the use of operators within nested expressions.

If you write an expression that includes multiple operators, the PowerCenter Integration Service evaluates the expression in the following order:

1. Arithmetic operators
2. String operators
3. Comparison operators
4. Logical operators

The PowerCenter Integration Service evaluates operators in the order they appear in the following table. It evaluates operators in an expression with equal precedence to all operators from left to right.

The following table lists the precedence for all transformation language operators:

Operator

Meaning

(  ) Parentheses.
+, -, NOT Unary plus and minus and the logical NOT operator.
*, /,% Multiplication, division, modulus.
+, – Addition, subtraction.
|| Concatenate.
<, <=, >, >= Less than, less than or equal to, greater than, greater than or equal to.
=, <>, !=, ^= Equal to, not equal to, not equal to, not equal to.
AND Logical AND operator, used when specifying conditions.
OR Logical OR operator, used when specifying conditions.

 

The transformation language also supports the use of operators within nested expressions. When expressions contain parentheses, the PowerCenter Integration Service evaluates operations inside parentheses before operations outside parentheses. Operations in the innermost parentheses are evaluated first.

For example, depending on how you nest the operations, the equation 8 + 5 – 2 * 8 returns different values:

Equation

Return Value

8 + 5 – 2 * 8 -3
8 + (5 – 2) * 8 32

Introduction to Informatica

Introduction to Informatica

Informatica is an ETL tool which is used to simplify the ETL operations. Informatica provides an environment that allows you to load data into a centralized location, such as a data warehouse or operational data store (ODS). You can extract data from multiple sources, transform the data according to business logic and load the transformed data into file or in target tables.

What is ETL?

ETL stands for Extract, Transform and Load.

  • Extraction – Extracting the data from different sources (it can be flat files, different      databases).
  • Transformation – Business logic that can be applied on    data to fulfill the requirement.
  • Loading – After applying the transformation, loading the data into the targeted file or database.

Why ETL?     

Reporting and Analyzing – Business users wants the reports that consists of data for the business purposes in order to analyze with the data. In such cases ETL is widely used.

For example:

  • How much the product sold in this country by this time?
  • How many people used this product for this type of transaction?

For such questions the reports answers.

Power center Tools

  • Designer.
  • Workflow Manager.
  • Workflow Monitor.
  • Repository Manager.

Designer:

It is a tool which is used to do the ETL operation. It consists of five components.

  • Source Analyzer –  Import or create source definitions.
  • Target Designer –  Import or create target definitions.
  • Transformation Developer – Develop transformations to use in mappings. You can also develop user-defined functions to use in expressions.
  • Mapplet Designer – Create sets of transformations to use in mappings. It is a re-usable component which can have more than one transformations.
  • Mapping Designer – Create mappings that the Integration Service uses to extract, transform, and load data. It shows the entire mapping between the source and the targets

Workflow Manager:

In the Workflow Manager, you define a set of instructions to execute tasks such as sessions, emails, and shell commands. This set of instructions is called a workflow.

The Workflow Manager has the following tools,

  • Task Developer – Create tasks you want to accomplish in the workflow.
  • Worklet Designer – Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can nest worklets inside a workflow.
  • Workflow Designer – Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow.

Workflow Monitor:

You can monitor workflows and tasks in the Workflow Monitor. You can view details about a workflow or task in Gantt Chart view or Task view. You can run, stop, abort, and resume workflows from the Workflow Monitor. You can view sessions and workflow log events in the Workflow Monitor Log Viewer.

Repository Manager:

It can perform the admin operations such as user creation, providing the rights to the users, creating groups, monitoring the flows, etc.,