Category Archives: Impala

Setting default resource pool for JDBC connections

This is a quick tip about connecting to Hive or Impala via JDBC. Accessing hive or impala using their JDBC driver is very convenient. Client programs s like beeline or Jetbrains DataGrip use it as the main way of accessing Hive/Impala and many people also use it in their own written programs. Things get a… Read More »

Exploring Hive/Impala partitions – continued

After discussing the use and benefits of Hive/Impala partitions, we will look deeper at how Hive implements partitioning at the low level. First of all, we should determine where in HDFS hive metastore keeps its tables. This path is represented by hive parameter hive.metastore.warehouse.dir that can be changed in hive-sire.xml file or via Cloudera manager (under Hive… Read More »

Exploring Hive/Impala partitions

Table partitioning is a common practice in RDBMS world and all major databases support it. Basically it is splitting the table data into several physical parts based on a function, range or value of a column or a set of columns, while keeping all the data under the same logical unit which is the table. This… Read More »