Home
Blog
General
Hive Data Types And Partitioning

Hive Data Types And Partitioning

Ratings:

(4)Views: 624

Share this blog:

The open source framework, Hadoop ecosystem, helps to store and process Big Data in a distributed environment. It contains different sub-projects (tools) such as Sqoop, Pig, and Hive.

Hive is an open-source data warehouse infrastructure tool to 1) process 2) query 3) analyze large structured datasets in Hadoop. This platform is used to develop SQL type scripts to do MapReduce operations.

Hive Data Types

Broadly Hives can be classified into Primitive and Complex data types.

Hive Primitive Data Types

1. Integral Types

Integer type data can be specified using integral data types, referred as INT. When the data range is smaller than the INT, use SMALLINT and when the data range exceeds the range of INT or is greater than INT, then use BIGINT. TINYINT is smaller than SMALLINT.

The following table depicts various INT data types:

Type	Postfix	Example
TINYINT	Y	10Y
SMALLINT	S	10S
INT	-	10
BIGINT	L	10L

2. Numeric Types

TINYINT (1-byte signed integer, from -128 to 127)
SMALLINT (2-byte signed integer, from -32,768 to 32,767)
INT (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
FLOAT (4-byte single precision floating point number)
DOUBLE (8-byte double precision floating point number)
DECIMAL (Hive 0.13.0 introduced user definable precision and scale)

3. String Types

String type data types are usually represented with single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters.

Data Type	Length
VARCHAR	1 to 65355
CHAR	255

4. Date or Time Types

TIMESTAMP - It supports java.sql and also the traditional UNIX timestamp with optional nanosecond precision. Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd hh:mm:ss.ffffffffff”.
DATE - DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}.

5. Decimals

The DECIMAL type in Hive is as same as Bigdecimal format of Java. It is used for representing immutable arbitrary precision. Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately 10308. to . The syntax and example is as follows:

DECIMAL(precision, scale)

decimal(10,0)

6. Union Types

Union is a collection of heterogeneous data types. You can create an instance using create union. The syntax and example is as follows:

UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>

{0:1}

{1:2.0}

{2:["three","four"]}

{3:{"a":5,"b":"five"}}

{2:["six","seven"]}

{3:{"a":8,"b":"eight"}}

{0:9}

{1:10.0}

7. Floating Point Types

Numbers with decimal points are referred as Floating Point types. Generally, this type of data is composed of DOUBLE data type.

8. Null Value

Missing values are represented by NULL.

Hive Complex Data Types

The Hive complex data types are as follows:

arrays: ARRAY<data_type>

maps: MAP<primitive_type, data_type>

structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>

union: UNIONTYPE<data_type, data_type, ...>

Hive Partitions

To simplify the query a portion of the data stored, Hive organizers tables into partitions. Based on the values of partitioned columns the data tables are segregated into parts. Which means the data within a table is split across multiple partitions. Particular value(s) of partition column(s) corresponds to each partition, which is stored in sub-directory of table’s directory on HDFS.

How to Add Partition?

To add a table we need to alternate the table first. Let us assume we have a table called student with fields such as Id, Name, Class, & Section.

Syntax:

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;

partition_spec:

: (p_column = p_col_value, p_column = p_col_value, ...)

The following query is used to add a partition to the employee table.

hive > ALTER TABLE student > ADD PARTITION (year=’2013’) > location '/2012/part2012';

How to Rename a Partition?

Syntax:

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

The following query is used to rename a partition:

hive > ALTER TABLE student PARTITION (Class=’6’) > RENAME TO PARTITION (Section=’b’);

How to Drop a Partition?

Syntax:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;

The following query is used to drop a partition:

hive > ALTER TABLE employee DROP [IF EXISTS] > PARTITION (Class=’6’);

Author Bio

Natasha is a Content Manager at SpringPeople. She has been in the edu-tech industry for 7+ years. With a aim to provide the best bona fide information on tech trends, she is associated with SpringPeople. SpringPeople is a global premier training provider for high-end and emerging technologies, methodologies and products. Partnered with parent organizations behind these technologies, SpringPeople delivers authentic and most comprehensive training on related topics.

You liked the article?

Like: 0

Vote for difficulty

Current difficulty (Avg): Medium

EasyMediumHardDifficultExpert

IMPROVE ARTICLEReport Issue

Recommended Courses

ETL Testing Training 4.62732 : Learners

IBM BigFix Training 4.83213 : Learners

1/15

About Author

Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.

Stay Updated

Get stories of change makers and innovators from the startup ecosystem in your inbox

Related Blogs

Hive Data Types And Partitioning

Hive Data Types

Hive Primitive Data Types

1. Integral Types

2. Numeric Types

3. String Types

4. Date or Time Types

5. Decimals

6. Union Types

7. Floating Point Types

8. Null Value

Hive Complex Data Types

Hive Partitions

How to Add Partition?

Syntax:

How to Rename a Partition?

Syntax:

How to Drop a Partition?

Syntax:

Author Bio

Recommended Articles

Recommended Courses

About Author