Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy: Hive - A Warehousing Solution Over a Map-Reduce Framework. VLDB, 2009.
To deal with large amount of data/images, there's a tool OpenMP helping us do the map-reduce. But, the map-reduce programming model is low level and is hard to maintain and develop under their settings.
In this paper, it promote a top level data warehousing solution called Hive. It supports queries expressed in a SQL-like language - HiveQL, which are compiled into map-reduce jobs executed on Hadoop.
The following figure is the architecture of Hive.
In conclusion, there are several component:
1.Hive had given a external interface for both user command line and web UI.
2.The Hive Thrift Server exposes a simple client API to execute HiveQL statements. Thrift is a
framework for cross-language services; a server written in one language can also support clients in other languages. The Thrift Hive clients generated in different languages are used to build common drivers like JDBC (java), ODBC (C++), and scripting drivers written in php, perl, python etc.
3.The Metastore is the system catalog. All other components of Hive interact with the metastore.
4.The Driver manages the life cycle of a HiveQL statement during compilation, optimization and execution.
5.The Compiler is invoked by the driver upon receiving a HiveQL statement. The compiler translates this
statement into a plan which consists of a DAG of mapreduce jobs.
6.The driver submits the individual map-reduce jobs from the DAG to the Execution Engine in a topological order.
In metastore, the storage system should be optimized for online transactions with random accesses and updates.The metastore uses either a traditional relational database (like MySQL, Oracle) but not HDFS.
沒有留言:
張貼留言