After discussing the use and benefits of Hive/Impala partitions, we will look deeper at how Hive implements partitioning at the low level.
First of all, we should determine where in HDFS hive metastore keeps its tables. This path is represented by hive parameter hive.metastore.warehouse.dir that can be changed in hive-sire.xml file or via Cloudera manager (under Hive service -> configuration):

Once we have the path we can examine it in HDFS:
[[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse Found 6 items drwxr-xr-x - hive hive 0 2016-06-06 14:30 /user/hive/warehouse/_tmp.sampledata1 drwxrwxrwt - hive hive 0 2016-09-15 11:32 /user/hive/warehouse/sampledata drwxrwxrwt - hive hive 0 2016-06-07 16:24 /user/hive/warehouse/sampledata1 drwxrwxrwt - hive hive 0 2016-09-15 11:40 /user/hive/warehouse/sampledata_p drwxrwxrwt - hive hive 0 2016-09-29 14:41 /user/hive/warehouse/zip drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part
We are, of course, interested in the last two tables, zip and zip_part. First, let’s examine the unpartitioned table:
[[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/zip Found 3 items -rwxrwxrwt 3 hive hive 773604 2016-09-29 13:28 /user/hive/warehouse/zip/14zpallagi.csv -rwxrwxrwt 3 hive hive 21660912 2016-09-29 14:39 /user/hive/warehouse/zip/data.csv -rwxrwxrwt 3 hive hive 21660912 2016-09-29 14:41 /user/hive/warehouse/zip/data_copy_1.csv
We can see that the original data file was just copied into the zip directory as is. Let’s copy the file to local filesystem and take a look at its contents:
[[email protected] ~]$ hdfs dfs -copyToLocal /user/hive/warehouse/zip/data.csv /var/lib/hadoop-hdfs [[email protected] ~]$ tail -10 /var/lib/hadoop-hdfs/data.csv 12,FL,34737,5,230,40,190,0 12,FL,34737,6,70,0,50,0 12,FL,34739,1,130,70,40,20 12,FL,34739,2,90,40,50,0 12,FL,34739,3,50,0,50,0 12,FL,34739,4,0,0,0,0 12,FL,34739,5,30,0,20,0 12,FL,34739,6,0,0,0,0 12,FL,34741,1,13790,7280,1670,4650 12,FL,34741,2,5770,2350,1580,1680
You can see that there are Eight columns in the file including the state code. Now we will try to do the same with the partitioned table:
[[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/zip_part Found 11 items drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=AK drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=AL drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=AR drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=AZ drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=CA drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=CO drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=CT drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=DC drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=DE drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=FL drwxrwxrwt - hive hive 0 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=STATE
Here you can see that under the basic table directory there are numerous sub directories, corresponding to the table’s partitions, along with a general partition “state=STATE” which is used for rows that do not fall into any other partition. Let’s see what’s inside of them:
[[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/zip_part/state=FL Found 1 items -rwxrwxrwt 3 hive hive 195396 2016-09-29 14:46 /user/hive/warehouse/zip_part/state=FL/000000_0
Now we will copy it to local filesystem and see it’s contents:
[[email protected] ~]$ hdfs dfs -copyToLocal /user/hive/warehouse/zip_part/state=FL/000000_0 /var/lib/hadoop-hdfs [[email protected] ~]$ ls 000000_0 [[email protected] ~]$ tail -10 000000_0 12,34737,5,230,40,190,0 12,34737,6,70,0,50,0 12,34739,1,130,70,40,20 12,34739,2,90,40,50,0 12,34739,3,50,0,50,0 12,34739,4,0,0,0,0 12,34739,5,30,0,20,0 12,34739,6,0,0,0,0 12,34741,1,13790,7280,1670,4650 12,34741,2,5770,2350,1580,1680
If we compare the content of this file to the content of the unpatitioned file above we can see that the unpartitioned file has Eight columns including state. The partitioned table has only Seven columns and the state column is missing. This is because the state column is already contained in the directory name and thus hive can infer the “missing” column from the directory structure instead of having redundant data.
And what happen if we partition on more than one column ?
Let’s recreate the zip_part table, this time partitioned by two columns:
create table zip_part ( zipcode int, agi_stub int, n1 int, mars1 int, mars2 int, mars4 int) partitioned by (statefips string, state string) row format delimited fields terminated by ',' stored as textfile; insert into zip_part partition(statefips, state) select zipcode,agi_stub,n1,mars1,mars2,mars4,statefips,state from zip;
Then, if we examine the underlying directory structure, we can see that it has just added another level of subdirectories, corresponding to the added partition key:
[[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/zip_part Found 11 items drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=1 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=10 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=11 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=12 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=2 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=4 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=5 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=6 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=8 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=9 drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=STATEFIPS [[email protected] ~]$ hdfs dfs -ls /user/hive/warehouse/zip_part/statefips=2 Found 1 items drwxrwxrwt - hive hive 0 2016-10-13 16:28 /user/hive/warehouse/zip_part/statefips=2/state=AK
This concludes the Hive/Impala partitioning coverage.