Hive Data Types And Partitioning

Ratings:
(4)
Views:624
Banner-Img
  • Share this blog:

The open source framework, Hadoop ecosystem, helps to store and process Big Data in a distributed environment. It contains different sub-projects (tools) such as Sqoop, Pig, and Hive.

Hive is an open-source data warehouse infrastructure tool to 1) process 2) query 3) analyze large structured datasets in Hadoop. This platform is used to develop SQL type scripts to do MapReduce operations.

Hive Data Types

Broadly Hives can be classified into Primitive and Complex data types.

Hive Primitive Data Types

1. Integral Types

Integer type data can be specified using integral data types, referred as INT. When the data range is smaller than the INT, use SMALLINT and when the data range exceeds the range of INT or is greater than INT, then use BIGINT. TINYINT is smaller than SMALLINT.

The following table depicts various INT data types:

Type

Postfix

Example

TINYINT

Y 10Y

SMALLINT

S

10S

INT

-

10

BIGINT

L

10L

2. Numeric Types

  • TINYINT (1-byte signed integer, from -128 to 127)
  • SMALLINT (2-byte signed integer, from -32,768 to 32,767)
  • INT (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
  • BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • FLOAT (4-byte single precision floating point number)
  • DOUBLE (8-byte double precision floating point number)
  • DECIMAL (Hive 0.13.0 introduced user definable precision and scale)

3. String Types

String type data types are usually represented with single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters.

Data Type

Length

VARCHAR

1 to 65355

CHAR

255

4. Date or Time Types

  • TIMESTAMP - It supports java.sql and also the traditional UNIX timestamp with optional nanosecond precision. Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd hh:mm:ss.ffffffffff”.
  • DATE - DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}.

5. Decimals

The DECIMAL type in Hive is as same as Bigdecimal format of Java. It is used for representing immutable arbitrary precision. Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately 10308. to .  The syntax and example is as follows:

DECIMAL(precision, scale)

decimal(10,0)

6. Union Types

Union is a collection of heterogeneous data types. You can create an instance using create union. The syntax and example is as follows:

UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>

{0:1}

{1:2.0}

{2:["three","four"]}

{3:{"a":5,"b":"five"}}

{2:["six","seven"]}

{3:{"a":8,"b":"eight"}}

{0:9}

{1:10.0}

7. Floating Point Types

Numbers with decimal points are referred as Floating Point types. Generally, this type of data is composed of DOUBLE data type.

8. Null Value

Missing values are represented by NULL.

Hive Complex Data Types

The Hive complex data types are as follows:

arrays: ARRAY<data_type>

maps: MAP<primitive_type, data_type>

structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>

union: UNIONTYPE<data_type, data_type, ...>

Hive Partitions

To simplify the query a portion of the data stored, Hive organizers tables into partitions. Based on the values of partitioned columns the data tables are segregated into parts. Which means the data within a table is split across multiple partitions. Particular value(s) of partition column(s) corresponds to each partition, which is stored in sub-directory of table’s directory on HDFS.

How to Add Partition?

To add a table we need to alternate the table first. Let us assume we have a table called student with fields such as Id, Name, Class, & Section.

Syntax:

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;

partition_spec:

: (p_column = p_col_value, p_column = p_col_value, ...)

The following query is used to add a partition to the employee table.

hive > ALTER TABLE student > ADD PARTITION (year=’2013’) > location '/2012/part2012';

How to Rename a Partition?

Syntax:

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

The following query is used to rename a partition:

hive   > ALTER TABLE student PARTITION (Class=’6’) > RENAME TO PARTITION (Section=’b’);

How to Drop a Partition?

Syntax:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;

The following query is used to drop a partition:

hive  > ALTER TABLE employee DROP [IF EXISTS] > PARTITION (Class=’6’);


 

Author Bio

Natasha is a Content Manager at SpringPeople. She has been in the edu-tech industry for 7+ years. With a aim to provide the best bona fide information on tech trends, she is associated with SpringPeople. SpringPeople is a global premier training provider for high-end and emerging technologies, methodologies and products. Partnered with parent organizations behind these technologies, SpringPeople delivers authentic and most comprehensive training on related topics.

You liked the article?

Like : 0

Vote for difficulty

Current difficulty (Avg): Medium

Recommended Courses

1/15

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.


Stay Updated


Get stories of change makers and innovators from the startup ecosystem in your inbox