SQL Server Integration service (SSIS) is a component of the Microsoft SQL Server Database is a software that is used to achieve a broad scope of data migration tasks. SSIS is an objective for workflow applications and data integration. It is a flexible data warehousing tool that is used for data loading, extraction, and transformation like merging, cleaning, aggregating the data, etc. 

It can extract data from different sources like Excel files, Oracle, SQL Server databases, DB2 databases, etc. SSIS makes it easy to move data from one database to another database. SSIS also includes graphic tools and windows wizards to perform workflow functions like sending emails, FTP operations, Data sources and destination.

 

Index

SSIS Components

SSIS ETL 

DBA Tasks in SSIS 2005 

Event Handlers in SSIS 

Different ways to execute SSIS package

Why we use SSIS

The following are the key reasons to use SSIS

  • SSIS tool helps to merge data from various data stores
  • It helps to clean and standardize the data
  • It can load the data from one datastore to another datastore very easily
  • SSIS contains a GUI that allows the users to transfer the data quickly instead of writing an extensive program
  • Populates Data warehouse and data marts
  • It is cheaper than other ETL tools
  • Automates administrative functions and Data loading
  • Build Business Intelligence into a Data Transformation process
  • It provides robust error and event handling.

Features of SSIS

The following are the salient features of SSIS

  • Relevant Data integration functions
  • Data mining query transformation
  • High-speed data connectivity elements such as connectivity to Oracle or SAP
  • Effective implementation speed
  • Provides rich Studio environment
  • Tight integration with other Microsoft SQL software
  • Term extraction and Term Lookup transformation

SSIS Architecture

Following are the components of SSIS architecture:

  • Control flow
  • Data flow
  • Event handling
  • Package explorer
  • Parameters

Control flow

Control flow acts as a brain of the SSIS package. It includes containers and tasks that are managed by precedence constraints. It helps to arrange the order of execution for all its components. 

Precedence constraints

The precedence constraints are the package component that directs tasks to execute in a predefined order. It defines the workflow of the entire SSIS package. It helps you to connect tasks to control the flow. Depending on the configuration, the precedence constraints can be represented as dotted or solid lines with blue, red or green colour. 

Task

A task is an individual unit of work. It is the same as a method used in a programming language. We don’t use any programming codes, but we implement drag and drop techniques to design surfaces and to configure them.

Container

Containers are objects that help SSIS to provide structure to one or more tasks. It provides visual consistency and also allows to declare event handlers and variables that could be in the scope of specific containers.

There are three types of containers. They are as follows:

  • Sequence container: Sequence container is a subset of an SSIS package. It acts as a single control point for the tasks that are defined inside a container. It is used for grouping the tasks. We can split the control flow into multiple logical units using sequence containers.
  • For loop container: It is used for executing all inside tasks for a fixed number of executions. It provides the same functionality as the sequence container except that it also allows us to run the tasks multiple times
  • Foreach loop container: It is more complicated than For Loop container since it has many use cases and requires more complex configurations. It can accomplish more popular actions such as looping across files within a directory or looping over an executed SQL task result set.

Data flow

Data flow tasks encapsulate the data flow engine that moves data between source and destination and allows the user to transform, clean, and modify the data as it is to run. It is also termed as Heart of SSIS

Packages

One of the core components of SSIS is packages. It is the collection of tasks that execute in order. The precedence constraint helps to manage the order in which the task will run. Packages can help the user to save files onto a SQL Server in the MSDB or package catalogue database. 

Parameters

Parameters allow the user to assign values to the properties within packages at the time of execution. It behaves much like variables but with a few main exceptions. It also permits you to change package execution without editing and redeploying the package.

How SSIS works

SSIS is a platform for data integration and workflow. Both the tasks data transformation and workflow creation can be done using SSIS packages. It consists of three major components, namely: 

  • Operational data
  • ETL process
  • Data warehouse

Operational data

Operational data is a database designed to integrate data from multiple sources for additional operations on the data. It is the place where most of the data used in the current operation are housed before it is transferred to the data warehouse for long term storage.

ETL Process

Extract, Transform, and Load(ETL) is a method of extracting the data from different sources, transforming this data to achieve the requirement and loading into a target data warehouse. The data can be in any format XML file, flat file, or any database file. It also ensures that the data stored in the data warehouse is accurate, high quality, relevant and useful for the users.

Extract: It is the process of extracting the data from various data sources depending on different validation points. And the data can be any format such as XML, flat file, or any database file. 

Transformation: In transformation, the entire data is analyzed, and various functions are applied to it to load the data to the targeted database in a cleaned and general format.

Load: It is the process of loading the cleaned and extracted data to a target database using minimal resources. It also validates the number of rows that have been processed. The index helps to track the number of rows that are loaded in the data warehouse. It also helps to identify the data format.

Data warehouse

The data warehouse is a single, complete, and consistent store of data that is formulated by combining data from multiple data sources. It captures the data from diverse sources for useful analysis and access. Data warehousing is a large set of data accumulated that is used for assembling and managing data from various sources to answer business questions. It helps in making a decision.

Types of SSIS tasks

In the SSIS tool, there are different types of tasks that are to perform various kinds of activities.

Task

Description

Execute SQL task

It executes the SQL report against a relative database.

Data Flow task

It can read the data from different sources. Transform the data when it is in the memory and write it out against various destinations

Execute Package task

It is to execute other packages within the same project.

Execute process task

It is to specify the command line parameters.

Analysis Services Processing task

It is to process objects of a tabular model or as an SSAS cube.

FTP task

It allows the user to perform basic FTP functionalities.

File System task

It performs manipulations in the file system such as deleting files, creating directories, renaming files and moving the source file. 

Script task

It is a blank task. You can write a .NET code that performs any task; you want to accomplish.

Sent mail task

It is to send an email to notify users that your package has finished, or some error occurs.

Bulk insert task

Use can loads data into a table by using the bulk insert command.

XML task

It helps to split, merge, or reformat any XML file.

WMI event watcher task

It allows the SSIS package to wait for and respond to certain WMI events.

Web Service task

It executes a method on a web service.

Other essential ETL tools

  • SAS Data management
  • Oracle Warehouse Builder
  • IBM Infosphere Information Server
  • Elixir Repertoire for Data ETL
  • SAP Data Services
  • PowerCenter Informatica
  • Sargent Data flow

Best practices of SSIS

  • SSIS is an in-memory pipeline. So it is essential to ensure that all transformations occur in memory
  • Plan for capacity by understanding resource utilization
  • Schedule and distribute it correctly
  • Optimize the SQL Lookup transformation, data source, and destination
  • Try to minimize logged operations

Advantages of using SSIS

SSIS tool offers many benefits. Some of them are as follows:

  • Ease and speed implementation 
  • Broad documentation support
  • Tight integration with SQL server
  • Provides real-time, message-based capabilities
  • Support for the distribution model
  • It allows you to use the SQL Server Destination rather than OLE DB to load the data faster
  • Standardized data integration
  • It helps you to remove networks as a restriction for the insertion of data by SSIS into SQL.

Drawbacks of Using SSIS

Few disadvantages of using SSIS tool are as follows

 

  • Unclear vision and strategy
  • SSIS lacks support for alternative data integration styles
  • Sometimes it creates issues in non-windows environments
  • Problematic integration with other products