Tuesday, May 18, 2010

Book review - Pulling Strings with Puppet; configuration mgmt made easy

Puppet is an amazing tool for keeping everything on a cluster in sync. I/we use it for apache cassandra, hadoop and our own internal software distribution. 
Its a short book but given the light level of documentation around the puppet opensource project I found this book useful to get you started automating machine administration. However, I realize now that I am using puppet more and more what holds it back is that its such a thin book it lacks complete examples. This is really a critical flaw. Maybe if it came with source code on a website this hole could be plugged. In the end I got a lot out of online tutorials and then used this book for reference/reminders. Eventually I've mostly moved past this book and I use the reference guides on the reductivelabs/puppetlabs website: 
http://docs.puppetlabs.com/references/stable/configuration.html
http://docs.puppetlabs.com/references/latest/type.html#package

Another annoyance is the lack of an index so I recommend the ebook.
Final comments: 

- Puppet is based on Ruby, so a light understanding of Ruby does help (especially if you need to patch puppet).
- I did have to patch my redhat 4 box's version of puppet that i got from epel yum repo since it was failing on templates and causing a SEGV in ruby. See http://projects.reductivelabs.com/issues/2604

Book review - Hadoop: The Definitive Guide by Tom White

I really enjoyed the book "Hadoop: The Definitive Guide by Tom White". 
It has everything you need to: 
a) Get started running your own cluster and writing your own MR jobs 
b) Understand how to administer the cluster 
c) Troubleshoot your programs 
d) Learn about really important side projects like Pig, Hive, Zookeeper and HBase (of which I think Hive is the most amazing) 

One thing I wish I'd done is go through the cloudera online tutorials BEFORE reading this book. If I'd done that (instead of doing so afterwards) I think I'd have got through certain sections of the book much quicker; basically I would have 'got it' quicker. See http://www.cloudera.com/resources/?type=Training


After reading the book I organized a little geek meet where I covered a synopsis of Hadoop, Pig and Hive with the development team. I also introduced them to the Cloudera training virtual machine. That is just an amazing resource for learning hadoop et al. It also introduced me to some unique cool things like the sqoop program (http://www.cloudera.com/developers/downloads/sqoop/for reading tables out of an RDBMS like MySQL or Oracle and auto populating Hadoop and/or Hive...very useful!