Publisher: Packt Publishing
Language : English
Paperback : 332 pages [ 235mm x 191mm ]
Release Date : August 2012
ISBN : 1849517142
ISBN 13 : 9781849517140
Author(s) : Yifeng Jiang
Available at : Packt Publishing Website, Amazon
HBase is an Apache project designed to enable realtime access to very large datasets. In order to enable this it leverages components from the Hadoop projects HDFS (distributed, massive scale, fault taulerant file system modelled on GFS), Zookeeper and Hadoop MapReduce.
The newly published book from Packt Publishing is designed to bring an administrator up to speed with the creation of a HBase cluster and help them with a wide range of tasks. It follows the style of presenting issues and solutions in a recipe which are then grouped into 9 high level chapters. This unfortunately leads to a little bit of repetition in some of the technical detail, but this does help the reader come in from a high level perspective knowing their issue; they can easily find a matching recipe in a section that makes sense rather than having to dig around. The recipes are good although they shy away from stating best practice which is probably for the best as the technology in question is maturing rapidly.
The book can also be read sequentially; the recipes are well written with clear step by step instructions on achieving a highly available, monitored HBase setup (optionally with Hive) in either a local environment or on a cloud based provider such as EC2. Note there are considerations for running on a cloud and the book provides pragmatic usable solutions that you can implement yourself.
The book addresses such topics as maintenance and security, troubleshooting and tuning of HBase (and implicitly Hadoop, MapReduce and the underlying hardware). Recovery of Hadoop clusters is covered to an extent I would expect is practical in a published book, but your mileage may vary with whether these recipes are pertinent to your infrastructure.
The book presents Hadoop concepts simply enough for a first timer to come into this topic with no prior knowledge of any Hadoop components and be able to perform the exercises. However it does get into advanced topics later on, and some of the things that are covered (such as manual region server splitting and key management) left me with the impression that people unfamiliar with Hadoop would either need real world experience of such matters or alternate sources of information to leave them with a better understanding of what they were doing and why. After reading this book they would have a good foundation knowledge of what a cluster should look like and should have been introduced to enough concepts to find out more by themselves.
Overall I consider this book to be well written, the author has a demonstratable knowledge of System Administration and practical experience with installing and managing Hadoop and HBase installations. Well done Yifeng.