SAP HANA Tutorials Overview
Welcome to SAP HANA Tutorials. The objective of these tutorials is to gain in depth understanding of SAP HANA. In these tutorials, we will cover topics such as SAP HANA Architecture, Data Modeling, Replication, Creating Views, Joins etc.
In addition to SAP HANA tutorials, we will cover common interview questions, and issues of SAP HANA.
SAP HANA Overview
SAP BW 7.3 (BW Data)
SAP ECC (Transaction data)
BOB 4.0 (Reporting tool)
DS 4.0 (ETL)
SYBASE (Replication Agent)
NONSAP OLAP (data sources)
SAP HANA Environment
HANA database – Place to prepare Hybrid Database
HANA Studio – Modeling And Administration
HANA information Composer – Web Based Modeling
Enable end user to do ‘Near” real time reporting from ASP/NONSAP ERP systems. (‘Near Real time’ – To get updated data subsequently)
Example: For Every Minute
Enable Process Oriented BI from SAP BW/NONSAP DLAP data sources
To create Hybrid In-Memory database using Advance process
ii) Hybrid database development
iV) Smart Business Intelligence
Desired to gain proficiency on SAP HANA? Explore the blog post on SAP HANA Training to become a pro in SAP HANA.
We have, Source systems such as SAP ERP, NON-SAP ERP, BW & And Non SAP OLAP
We can retrieve data from data providers to SAP HANA by performing replication
There are 3 types of replications to retrieve data from data providers.
Trigger based Replication
ETL based Replication
Log based Replication
SAP HANA Trigger Based Replication
Using SAP Landscape Transformation (LT) Replication Server is based on capturing database changes at a high level of abstraction in the source ERP system. This method of replication benefits from being database – independent, and can also parallelize database changes on multiple tables or by segmenting large table changes.
SAP HANA ETL – Based – Replication
Extraction – Transformation – Load (ETL) Based Data Replication Uses SAP Business Objects Data Services to specify and load the relevant business data in defined periods of time from an ERP system into the Sap HANA database. You can reuse the ERP application logic by reading extractors or utilizing SAP function modules in addition, the ETL based method offers options for the integration of third-party data providers.
SAP HANA Log-Based Replication
Transaction Log-based Data Replication Using Sybase Replication is based on capturing table changes from low-level database on files. This method is databases dependent.
Database changes are propagated on a per database transaction basis, and they are then replayed on the SAP HANA database. This means consistency is maintained, but at the cost of not being able to use parallelization to propagate changes.
SAP HANA Components
If the customer wants to use “ETL based replication” using Based he requires following components
SAP host agentHANA Information Composer
If the customer is planning to adopt “trigger based” data replication from HANA database then we require below
i) SAP HANA Database
ii) HANA Studio
iii) HANA Client
iv) SAP Host agent
v) HANA information composer
vi) SAP Landscape transformation Replication server
vii) Sap HANA Land controller
If the customer is planning to use log based data replication he should use
i) SAP HANA DB
ii) HANA Studio
iii) HANA Client
iv) HANA Host Agent
v) HANA Information composer
vi) HANA load controller
vii) Sybase Adaptive server
viii) Sybase Replication server/Agent
ETL &Trigger Level data replication are applied at replication level
log based’ replication is adopted at table level.
SAP HANA DB
It comes along with Sap Appliance which has row store and column store data store which are maintained by different calculation and SQL engines
SAP HANA Studio
It will allows you to do
ii) Manager replication Servers.
iii) communicate with Data services
SAP HOST Agent
This is required to enable the start and stop of services in every computer system also it enables to do maintaining activates
-In order to do group communication
– It is basis component (maintained by Basis)
SAP HANA DB Client
This is plug-in installed in the client system to give data inputs to HANA DB remotely
It is a web based Utility where the enclosure can upload the data and to create quick data views which can be given as input to Microsoft Excel and to Bo X celosias for reporting
The load controller is a HANA API(Applicable Programmer) which resides an SAP HANA architecture which controls replication process also it will take care of initial and data loads to HANA DB.
SAP HANA Landscape transformation
This Utility is used to monitor the data browsing when data is extracted from different ERP system which helps in bulk data transfer without any transformation which are used in BODS.
SYBASE Adaptive Serves
is a RDBMS system which is an utility of SYBASE to handle OLTP processing, XML documents etc., (grain level) which also helps in real time reporting, disasters recovery between source & target
BODS is a ERP utility which has designer component which can connect to SAP ERP (SAP Application Data store) which can load to SAP HANA Database as a target with normal data Haws or ABAP data flows (system generated)
-To do ETL process we should also configure Boos job server (Batch process) and Access Server (Real time Process)
Data Replication methods & their selections based on Source Systems
SAP HANA Interview Questions
What is SAP HANA?
SAP HANA is an in-memory database.
-It is a combination of hardware and software made to process massive real time data using In-Memory computing.
-It combines row-based, column-based database technology.
-Data now resides in main-memory (RAM) and no longer on a hard disk.
-It’s best suited for performing real-time analytics, and developing and deploying real-time applications.
An in-memory database means all the data is stored in the memory (RAM). This is no time wasted in loading the data from hard-disk to RAM or while processing keeping some data in RAM and temporary some data on disk. Everything is in-memory all the time, which gives the CPUs quick access to data for processing.
SAP HANA is equipped with multi engine query processing environment which supports relational as well as graphical and text data within same system. It provides features that support significant processing speed, handle huge data sizes and text mining capabilities.
What is the language SAP HANA is developed in?
The SAP HANA database is developed in C++.
What is the operating system supported by HANA?
Currently SUSE Linux Enterprise Server x86-64 (SLES) 11 SP1 is the Operating System supported by SAP HANA.
Can I just increase the memory of my traditional Oracle database to 2TB and get similar performance?
NO. You might have performance gains due to more memory available for your current Oracle/Microsoft/Teradata database but HANA is not just a database with bigger RAM. It is a combination of a lot of hardware and software technologies. The way data is stored and processed by the In-Memory Computing Engine (IMCE) is the true differentiator. Having that data available in RAM is just the icing on the cake.
What are the row-based and column based approach?
Row based tables
It is the traditional Relational Database approach
It store a table in a sequence of rows
Column based tables
It store a table in a sequence of columns i.e. the entries of a column is stored in contiguous memory locations.
SAP HANA is particularly optimized for column-order storage.
SAP HANA supports both row-based and column-based approach.
Following figure explains the difference between the two storage mechanism.
What are the advantages and disadvantages of row-based tables?
Row based tables have advantages in the following circumstances:
-The application needs to only process a single record at one time (many selects and/or updates of single records).
-The application typically needs to access a complete record (or row).
-Neither aggregations nor fast searching are required.
-The table has a small number of rows (e. g. configuration tables, system tables).
Row based tables have dis-advantages in the following circumstances
-In case of analytic applications where aggregation are used and fast search and processing is required. In row based tables all data in a row has to be read even though the requirement may be to access data from a few columns.
What are the advantages of column-based tables?
Faster Data Access
Only affected columns have to be read during the selection process of a query. Any of the columns can serve as an index.
Columnar data storage allows highly efficient compression because the majority of the columns contain only few distinct values (compared to number of rows).
Better parallel Processing
In a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core
In HANA which type of tables should be preferred – Row-based or Column-based?
SQL queries involving aggregation functions take a lot of time on huge amounts of data because every single row is touched to collect the data for the query response.
In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries. Data is also compressed, enabling shorter loading times.
To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data is stored in a column-based table.
The SAP HANA data-base allows joining row-based tables with column-based tables. However, it is more efficient to join tables that are located in the same row or column store. For example, master data that is frequently joined with transaction data should also be stored in column-based tables.
How does SAP HANA support Massively Parallel Processing?
With availability of Multi-Core CPUs, higher CPU execution speeds can be achieved. Also HANA Column-based storage makes it easy to execute operations in parallel using multiple processor cores.
In a column store data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core.
In addition operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores. With the SAP HANA database, queries can be executed rapidly and in parallel.
What is ad-hoc analysis?
In traditional data warehouses, such as SAP BW, a lot of pre-aggregation is done for quick results. That is the administrator (IT department) decides which information might be needed for analysis and prepares the result for the end users. This results in fast performance but the end user does not have flexibility.
The performance reduces dramatically if the user wants to do analysis on some data that is not already pre-aggregated. With SAP HANA and its speedy engine, no pre-aggregation is required. The user can perform any kind of operations in their reports and does not have to wait hours to get the data ready for analysis.
What is HANA modeling studio?
Modeling studio in HANA performs multiple task like
- Declares which tables are stored in HANA, first part is to get the meta-data and then schedule data replication jobs
- Manage Data Services to enter the data from SAP Business Warehouse and other systems
- Manage ERP instances connection, the current release does not support connecting to several ERP instances
- Use data services for the modeling
- Do modeling in HANA itself
- Essential licenses for SAP BO data service
What are the different compression techniques?
There are three different compression techniques
- Run-length encoding
- Cluster encoding
- Dictionary encoding
What is a transformation rules?
Transformation rule is the rule specified in the advanced replication setting transaction for the source table such that data is transformed during the replication process.
What are the advantages of SLT replication?
SAP SLT works on trigger based approach; such approach has no measurable performance impact in the source system
It offers filtering capability and transformation
It enables real-time data replication, replicating only related data into HANA from non-SAP and SAP source systems
It is fully integrated with HANA studios
Replication from several source systems to one HANA system is allowed, also from one source system to multiple HANA systems is allowed.
What is the role of master controller job in SAP HANA?
The job is arranged on demand and is responsible for
- Creating database triggers and logging table into the source system
- Creating Synonyms
- Writing new entries in admin tables in SLT server when a table is replicated/loaded
What happens if the replication is suspended for a longer period of time or system outage of SLT or HANA system?
If the replication is suspended for a longer period of time, the size of the logging tables increases.
For an Indepth knowledge on SAP HANA, click on below