Teradata Parallel Transporter - TPT
- TPT stands for Teradata parallel transporter. As the name implies “Parallel Transporting”.
- This is the new generation Load/unload utility provided by teradata.
- This act’s as an integrated ETL suite which helps to EXTRACT data from multiple sources, apply the TRANSFORMATION logic, and LOAD the data in the target Teradata database.
- TPT has all the features to run the stand-alone teradata load/unload utilities i.e (mload, tpump, fastexport, fastload).
- The TPT operator equivalent to standalone utilities are listed below
- As per the info the legacy utilities i.e (mload, tpump, fastexport, fastload). will continue to work as before and support the latest teradata version but any new features addition in load/unload utility will be done on TPT.
- TPT is a utility tool of teradata and has all the capabilities of ETL along with features of another teradata load/unload utilities.
- In simple terms, TPT can be described as a nutshell of all Teradata standalone utilities along with additional features.
Inclined to build a profession as Teradata Developer? Then here is the blog post on, explore Teradata Training
Teradata Parallel Transporter supports the following types of SQL statements
- Data Manipulation Language (DML): Insert, Update, Delete, Upsert, Merge, and Select
- Data Control Language (DCL): Give, Grant, and Revoke
- Data Definition Language (DDL): Create, Drop, Alter, Modify, Delete Database, Delete User, and Rename
Unlike conventional utilities and products in which multiple data sources are usually processed serially, Teradata Parallel Transporter can access multiple data sources in parallel. This ability can lead to increased throughput. Teradata Parallel Transporter also allows different specifications for different data sources and, if their data is UNION-compatible, merges them.
Teradata PT Parallel Environment
Although the traditional Teradata standalone utilities offer load and extract functions, these utilities are limited to a serial environment.
The below figure illustrates the parallel environment of Teradata PT.
Teradata PT uses data streams that act as a pipeline between operators. With data streams, data basically flows from one operator to another. Teradata PT supports the following types of environments:
• Pipeline Parallelism
• Data Parallelism
Teradata Parallel Transporter was designed for increased functionality and customer ease of use for faster, easier, and deeper integration. The capabilities include:
- Simplified data transfer between one Teradata Database and another; only one script is required to export from the production-and-load test system.
- The ability to load dozens of files using a single script makes the development and maintenance of the data warehouse easier.
- Distribution of workloads across CPUs on the load server eliminates bottlenecks in the data load process.
- Data flows through multiple instances of UPDATE OPERATOR and in-memory data streams to update tables.
- The option is available to export data to an in-memory data stream instead of landing data.
- The open database connectivity (ODBC) operator reads from the ODBC driver, which could pull data from any database; for example, DB2 or Oracle.
- Accessibility to myriad data sources via open standards is possible.
- Multiple operators can scan directories for files to load and can combine the data in the in-memory data stream with UNION ALL operation and stream operator loads.
- Script-building wizard is available to aid first-time users.
Important Features Of Teradata Parallel Transporter - TPT
The most important feature of TPT is its Scalability and Parallelism behavior.
Scalability & Performance Features Of Parallel Transporter
Below are some of the features of Parallel Transporter that can be used to increase load throughput:
- Parallel input file processing.
- Sometimes splitting the I/O processing up over multiple input files will help overcome a performance bottleneck.
- Directory scan feature – A single Parallel Transporter script can have multiple processes read multiple files in multiple directories.
- Multiple processes reading the same input file – in some cases having multiple read processes reading the same file can increase data throughput.
- Multiple processes sending data to the Teradata Database – performance can be improved by splitting the CPU time across multiple processes that are manipulating the data and then loading buffers to be sent to the Teradata Database.
- Multiple processes exporting the same table.
For In-depth knowledge on Teradata click on:
- Teradata Wallet
- Memory Management in TeraData
- TPUMP Structure and Process In TeraData
- BTEQ Features in Teradata