Shell Script Best Practices

1. Constant variables or environment variables name should be UPPER-CASE

2. Changeable variables name should be lower-case

3. Variables in function should be named “_xxx_xx”, starting with underscore “_” to avoid conflict with global variables

4. Use ${xxx} to reference variable, don’t omit {}

5. Code Indent

6. Shell function

  • Treat shell function as real function, and avoid to use global variables
  • Try to put similar functions into a file, and write the test script to do function test
  • Use return/echo to return integer from function, use echo to return string from function and use exit if you want the program to abort
  • The function name should be end with “_func”

7. Comments

  • File header comments should contain description, usage[optional], author, create/change date
  • Line comments – leading line commends or end line comments
  • Sections comments: comments for a block code just like line comments
  • Function comments: should contain description/input parameters/return value

8. Make sure the script can be started under any directory(this means you should not assume you script will be started as ./your-script.sh)

9. Empty string test, using ” to quota variable

10. Script argument check

  • If your script has less than 3 arguments, you can use argument position to check and get argument
  • If your script has many options to set, use getopts

11.  Code sequence in shell script: Global variables —> Shell functions –> Main shell script

If You need more visit our site – www.iguddy.com

 

Extract Transform Load

Extract Transform Load

Extract, transform, and load (ETL) is a process in data warehousing that involves

  • extracting data from outside sources,
  • transforming it to fit business needs, and ultimately
  • Loading it into the data warehouse.

ETL is important, as it is the way data actually gets loaded into the warehouse. This article assumes that data is always loaded into a data warehouse, whereas the term ETL can in fact refer to a process that loads any database.

Extract

The first part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as IMS or other data structures such as VSAM or ISAM. Extraction converts the data into a format for transformation processing.

Transform

The transform phase applies a series of rules or functions to the extracted data to derive the data to be loaded. Some data sources will require very little manipulation of data. However, in other cases any combination of the following transformations types may be required:

  • Selecting only certain columns to load (or if you prefer, null columns not to load)
  • Translating coded values (e.g. If the source system stores M for male and F for female but the warehouse stores 1 for male and 2 for female)
  • Encoding free-form values (e.g. Mapping “Male” and “M” and “Mr” onto 1)
  • Deriving a new calculated value (e.g. sale amount = qty * unit price)
  • Joining together data from multiple sources (e.g. look up, merge, etc)
  • Summarizing multiple rows of data (e.g. total sales for each region)
  • Generating surrogate key values
  • Transposing or Pivoting (turning multiple columns into multiple rows or vice versa)

Load

The load phase loads the data into the data warehouse. Depending on the requirements of the organization, this process ranges widely. Some data warehouses merely overwrite old information with new data. More complex systems can maintain a history and audit trail of all changes to the data.

If You need more visit our site – www.iguddy.com

 

Data Warehouse Concepts

If You need more visit our site – www.iguddy.com

Data Warehouse Concepts

OLTP:

  • Online Transaction Processing.
  • Its fully transactional based
  • Transaction should be made as soon as possible
  • It’s a client server model
  • It handles with current data.
  • The SQL statements retrieves only small amount of data.
  • Data’s are in the form on “Normalized data”
  • Mostly used SQL statements are INSERT,UPDATE,DELETE
  • Heterogeneous data
  • Examples: Online Reservation System, Withdraw money from ATM and etc.,

OLAP :

  • Online Analytical Processing
  • Otherwise called as “Decision Support System”
  • Deformalized data
  • Mainly focuses on database
  • Its fully analysis and report oriented
  • Main users are Management and business people
  • It handles Historical Data
  • Only Bulk Load
  • Homogeneous data

Difference between OLTP and OLAP

                          OLTP                        OLAP 
Online Transaction Processing Online Analytical Processing
OLTPs are the original source of the data. OLAP data comes from the various OLTP Databases
It is transaction oriented. It is Analytical and report oriented.
Normalized data. De normalized data.
Handles current data Handles  historical data
Operations like insert, update, delete can be performed. Since it is mainly used for bulk loading of data we mainly perform the insert operation.Update and delete operations are performed in some cases where it is necessary.
Select query is used to retrieve only minimal amount of data. Select query is used to retrieve bulk volume of data.
Used by large number of end users. Used by minimal number of users.
Consist of heterogeneous data. Consist of homogenous data.

Tasks Performed When data coming from Live to data warehouse

  • DE normalizing the data.
  • Data cleansing.
  • Data trimming
  • Removing duplicate records.
  • Decoding the data.
  • Calculating the data.
  • Data validation if required
  • Null Value replacement.
  • Data type conversion.

Life Cycle of Data Warehousing Project:

  • Project Planning
    • Budget Allocation.
    • Internal resource allocation
    • BRS (Business Requirement Study) which involves
    • Dimensional modeling
      • This involves conversion of OLTP tables to OLAP tables.
      • This is performed by Dimension modeler.
      • Physical Design
      • Application Development
      • Deployment and training
      • Maintenance

Teams involved in Data warehousing:

  • Data Dimension modeler.
  • DBA
  • Report Developer
  • Report Admin
  • ETL developer
  • ETL admin
  • System Admin.

Dataware House

  • Managing Historical data and helps the management to take right decisions at right time
  • A DW is Subject oriented, nonvolatile and integrated collection of data to support management

Data mart

  • Data mart is a subset of data warehouse

DWH Design Approach

  • Top Down Approach

  • Bottom Up  Approach

Dimension Table

  • De normalized data
  • Must have one primary key
  • Contains less number of records than fact table
  • It contains large number of columns

Fact Table

  • Contains foreign key to all the dimension table
  • Contains large number of data than the dimension table

Dimension model type

  • Star Schema
  • Snow flake schema

Star Schema

  • Fact table is surrounded by many dimension table

 

If You need more visit our site – www.iguddy.com

 

PERL Data Types and Objects

PERL Data Types and Objects

Perl has three data types:

  • scalars – Scalars can hold elementary data, i.e., string, integer, float …
  • arrays of scalars – Arrays of scalars in short arrays can hold arrays of scalar values.
  • associative arrays of scalars. – Associative arrays of scalars are also called as hashes contain key, value pairs. There are like arrays only but the difference is these are indexed by strings where arrays are indexed by integers.

Declaration of scalar

Syntax :   $var_name
Example: $name   # holds string value
$num    #holds numeric value

Note: no differentiation in declaring string or numeric variable.

Declaration of Arrays    

Syntax : @arr_name
Example: @name  #holds a list of scalar values

Declaration of Hashes

Syntax : %hash_name

Accessing Scalar:

$var_name

Accessing Arrays:

Single element at index ‘i’ – $arr_name[i]
All elements at once –  @arr_name

Accessing Hashes:

Single element at ‘key’ position –  $hash_name(key)
All elements at once – %hash_name

 

If You need more visit our site – www.iguddy.com

Export Informatica Objects using shell script

Exporting Informatica Objects from Informatica repository using shell script is not a complex one. It is very easy if you understand the functionality of certain commands like pmrep connect and pmrep ObjectExport.

pmrep connect is used to connect to a Informatica repository.

pmrep ObjectExport is used to export Objects from Informatica repository.

Code to connect to Informatica Server:

pmrep connect -r <<repo-name>> -d <<domain-name>>-n <<infa_user>> -x <<infa_pwd>>

code to Export Informatica Objects :

pmrep ObjectExport -n ${soure_infa} -o workflow -m -s -b -r  -f ${folder} -u ${tgt_infa}

 

If You need more click here www.iguddy.com