The open source framework, Hadoop ecosystem, helps to store and process Big Data in a distributed environment. It contains different sub-projects (tools) such as Sqoop, Pig, and Hive.
Hive is an open-source data warehouse infrastructure tool to 1) process 2) query 3) analyze large structured datasets in Hadoop. This platform is used to develop SQL type scripts to do MapReduce operations.
Broadly Hives can be classified into Primitive and Complex data types.
Integer type data can be specified using integral data types, referred as INT. When the data range is smaller than the INT, use SMALLINT and when the data range exceeds the range of INT or is greater than INT, then use BIGINT. TINYINT is smaller than SMALLINT.
The following table depicts various INT data types:
Type |
Postfix |
Example |
TINYINT |
Y | 10Y |
SMALLINT |
S |
10S |
INT |
- |
10 |
BIGINT |
L |
10L |
String type data types are usually represented with single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters.
Data Type |
Length |
VARCHAR |
1 to 65355 |
CHAR |
255 |
The DECIMAL type in Hive is as same as Bigdecimal format of Java. It is used for representing immutable arbitrary precision. Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately 10308. to . The syntax and example is as follows:
DECIMAL(precision, scale)
decimal(10,0)
Union is a collection of heterogeneous data types. You can create an instance using create union. The syntax and example is as follows:
UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}
Numbers with decimal points are referred as Floating Point types. Generally, this type of data is composed of DOUBLE data type.
Missing values are represented by NULL.
The Hive complex data types are as follows:
arrays: ARRAY<data_type>
maps: MAP<primitive_type, data_type>
structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>
union: UNIONTYPE<data_type, data_type, ...>
To simplify the query a portion of the data stored, Hive organizers tables into partitions. Based on the values of partitioned columns the data tables are segregated into parts. Which means the data within a table is split across multiple partitions. Particular value(s) of partition column(s) corresponds to each partition, which is stored in sub-directory of table’s directory on HDFS.
To add a table we need to alternate the table first. Let us assume we have a table called student with fields such as Id, Name, Class, & Section.
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
: (p_column = p_col_value, p_column = p_col_value, ...)
The following query is used to add a partition to the employee table.
hive > ALTER TABLE student > ADD PARTITION (year=’2013’) > location '/2012/part2012';
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
The following query is used to rename a partition:
hive > ALTER TABLE student PARTITION (Class=’6’) > RENAME TO PARTITION (Section=’b’);
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;
The following query is used to drop a partition:
hive > ALTER TABLE employee DROP [IF EXISTS] > PARTITION (Class=’6’);
Natasha is a Content Manager at SpringPeople. She has been in the edu-tech industry for 7+ years. With a aim to provide the best bona fide information on tech trends, she is associated with SpringPeople. SpringPeople is a global premier training provider for high-end and emerging technologies, methodologies and products. Partnered with parent organizations behind these technologies, SpringPeople delivers authentic and most comprehensive training on related topics.
You liked the article?
Like: 0
Vote for difficulty
Current difficulty (Avg): Medium
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.