One interesting "learning" around HBase is that its not really a good idea to use it for storing tons of binary data e.g. photos, map tiles, audio files, etc
http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
http://www.quora.com/Apache-Hadoop/How-would-HBase-compare-to-Facebooks-Haystack-for-photo-storage
...the message and experience here is to store the meta data in HBase, but keep the actual binary data outside HBase. I'm hearing others mirroring this learning.
There is clearly issues in how partitioning in HBase works and the way in which it spreads the work load across nodes and rebalances itself. Interestingly, Apache Cassandra has two partitioners out of the box: random + ordered. As I understand it, the HBase partitioner is closer to the ordered version and therefore trying these same use-cases on Cassandra with a random partitioner might be interesting as a compare and contrast.
My working assumption is that its also safer to store binary data outside Cassandra if you want constant predictable response times and rely on highly available (i.e. replicated) storage that is really good at binary data that is write light, read heavy.
I'm interested to hear from anyone using Apache Cassandra who is storing large amounts of binary data (upwards of 10s of TBs).
S.
Friday, May 13, 2011
Monday, November 29, 2010
Getting status out of JBoss
Recent question arose on how to monitor JBoss from the load balancer. Since we can't use JMX or the jmx-console from the load balancer (our usual method) we needed some HTTP endpoints that were easy to use out of the box (i.e. we didn't want to write our own servlet/jsp page). Two good candidates were:
http://hostname/jboss/status
and (if you have are using web services)
http://hostname/jbossws/services
Ok, thats it for now!
Tuesday, May 18, 2010
Book review - Pulling Strings with Puppet; configuration mgmt made easy
Puppet is an amazing tool for keeping everything on a cluster in sync. I/we use it for apache cassandra, hadoop and our own internal software distribution.
Its a short book but given the light level of documentation around the puppet opensource project I found this book useful to get you started automating machine administration. However, I realize now that I am using puppet more and more what holds it back is that its such a thin book it lacks complete examples. This is really a critical flaw. Maybe if it came with source code on a website this hole could be plugged. In the end I got a lot out of online tutorials and then used this book for reference/reminders. Eventually I've mostly moved past this book and I use the reference guides on the reductivelabs/puppetlabs website:
http://docs.puppetlabs.com/references/stable/configuration.html
http://docs.puppetlabs.com/references/latest/type.html#package
Another annoyance is the lack of an index so I recommend the ebook.
Final comments:
- Puppet is based on Ruby, so a light understanding of Ruby does help (especially if you need to patch puppet).
- I did have to patch my redhat 4 box's version of puppet that i got from epel yum repo since it was failing on templates and causing a SEGV in ruby. See http://projects.reductivelabs.com/issues/2604
Its a short book but given the light level of documentation around the puppet opensource project I found this book useful to get you started automating machine administration. However, I realize now that I am using puppet more and more what holds it back is that its such a thin book it lacks complete examples. This is really a critical flaw. Maybe if it came with source code on a website this hole could be plugged. In the end I got a lot out of online tutorials and then used this book for reference/reminders. Eventually I've mostly moved past this book and I use the reference guides on the reductivelabs/puppetlabs website:
http://docs.puppetlabs.com/references/stable/configuration.html
http://docs.puppetlabs.com/references/latest/type.html#package
Another annoyance is the lack of an index so I recommend the ebook.
Final comments:
- Puppet is based on Ruby, so a light understanding of Ruby does help (especially if you need to patch puppet).
- I did have to patch my redhat 4 box's version of puppet that i got from epel yum repo since it was failing on templates and causing a SEGV in ruby. See http://projects.reductivelabs.com/issues/2604
Book review - Hadoop: The Definitive Guide by Tom White
I really enjoyed the book "Hadoop: The Definitive Guide by Tom White".
It has everything you need to:
a) Get started running your own cluster and writing your own MR jobs
b) Understand how to administer the cluster
c) Troubleshoot your programs
d) Learn about really important side projects like Pig, Hive, Zookeeper and HBase (of which I think Hive is the most amazing)
One thing I wish I'd done is go through the cloudera online tutorials BEFORE reading this book. If I'd done that (instead of doing so afterwards) I think I'd have got through certain sections of the book much quicker; basically I would have 'got it' quicker. See http://www.cloudera.com/resources/?type=Training
After reading the book I organized a little geek meet where I covered a synopsis of Hadoop, Pig and Hive with the development team. I also introduced them to the Cloudera training virtual machine. That is just an amazing resource for learning hadoop et al. It also introduced me to some unique cool things like the sqoop program (http://www.cloudera.com/developers/downloads/sqoop/) for reading tables out of an RDBMS like MySQL or Oracle and auto populating Hadoop and/or Hive...very useful!
It has everything you need to:
a) Get started running your own cluster and writing your own MR jobs
b) Understand how to administer the cluster
c) Troubleshoot your programs
d) Learn about really important side projects like Pig, Hive, Zookeeper and HBase (of which I think Hive is the most amazing)
One thing I wish I'd done is go through the cloudera online tutorials BEFORE reading this book. If I'd done that (instead of doing so afterwards) I think I'd have got through certain sections of the book much quicker; basically I would have 'got it' quicker. See http://www.cloudera.com/resources/?type=Training
After reading the book I organized a little geek meet where I covered a synopsis of Hadoop, Pig and Hive with the development team. I also introduced them to the Cloudera training virtual machine. That is just an amazing resource for learning hadoop et al. It also introduced me to some unique cool things like the sqoop program (http://www.cloudera.com/developers/downloads/sqoop/) for reading tables out of an RDBMS like MySQL or Oracle and auto populating Hadoop and/or Hive...very useful!
Friday, March 26, 2010
Enabling ssh-agent for password-less ssh login on KDE/Gnome
So one of the things that had been bothering me was ssh'ing into remote machines with keys that had passwords. I wanted to use ssh-agent so that I would not have to type in my password. Trouble was that I couldn't figure out how to do it on my KDE desktop so that every time I opened a new shell the ssh-agent would be active. Everything I'd previously read talked about executing the command:
Well here is how to do it
start-ssh-agent script
#!/bin/bash
if [ -f ~/.ssh/ssh-agent.env ]; then
#echo "Agent already started"
i=1
#I just needed something above so the then was a valid statement
#...is there a noop in bash?
else
ssh-agent > ~/.ssh/ssh-agent.env
#we need to delete the echo from the source script since some
#commands like scp and ssh hate it when .cshrc echos stuff out
sed -e '/echo/d' ~/.ssh/ssh-agent.env > ~/.ssh/ssh-agent2.env
mv ~/.ssh/ssh-agent2.env ~/.ssh/ssh-agent.env
. ~/.ssh/ssh-agent.env
#echo "Agent started"
ssh-add
fi
Basically this script executes ssh-agent, captures the output that specifies the environment variables and writes them to a file for future reference from future shells. It then executes ssh-add to prompt you to enter the passwords for the private keys.
stop-ssh-agent script
#!/bin/bash
if [ -f ~/.ssh/ssh-agent.env ]; then
. ~/.ssh/ssh-agent.env > /dev/null
kill $SSH_AGENT_PID
rm ~/.ssh/ssh-agent.env
echo "Agent stopped"
else
echo "Agent is not running"
fi
Then in ~/.bashrc file you add the following:
if [ -f ~/.ssh/ssh-agent.env ]; then
. ~/.ssh/ssh-agent.env
else
~/bin/start-ssh-agent
fi
...this basically means...
if the ssh-agent.env file exists
source it so that the environment vars point to the ssh-agent process running.
else
run the script to start the ssh-agent and prompt for the passwords for any keys
This is not perfect and you need to be careful if you are doing agent forwarding into the box but for most general cases this works.
ssh-agent bash...but this only starts the agent for the shell started and any child processes of that shell. Consequently, every shell opened has its own ssh-agent and you have to do a ssh-add on each shell, typing in your password each time.
Well here is how to do it
start-ssh-agent script
#!/bin/bash
if [ -f ~/.ssh/ssh-agent.env ]; then
#echo "Agent already started"
i=1
#I just needed something above so the then was a valid statement
#...is there a noop in bash?
else
ssh-agent > ~/.ssh/ssh-agent.env
#we need to delete the echo from the source script since some
#commands like scp and ssh hate it when .cshrc echos stuff out
sed -e '/echo/d' ~/.ssh/ssh-agent.env > ~/.ssh/ssh-agent2.env
mv ~/.ssh/ssh-agent2.env ~/.ssh/ssh-agent.env
. ~/.ssh/ssh-agent.env
#echo "Agent started"
ssh-add
fi
Basically this script executes ssh-agent, captures the output that specifies the environment variables and writes them to a file for future reference from future shells. It then executes ssh-add to prompt you to enter the passwords for the private keys.
stop-ssh-agent script
#!/bin/bash
if [ -f ~/.ssh/ssh-agent.env ]; then
. ~/.ssh/ssh-agent.env > /dev/null
kill $SSH_AGENT_PID
rm ~/.ssh/ssh-agent.env
echo "Agent stopped"
else
echo "Agent is not running"
fi
Then in ~/.bashrc file you add the following:
if [ -f ~/.ssh/ssh-agent.env ]; then
. ~/.ssh/ssh-agent.env
else
~/bin/start-ssh-agent
fi
...this basically means...
if the ssh-agent.env file exists
source it so that the environment vars point to the ssh-agent process running.
else
run the script to start the ssh-agent and prompt for the passwords for any keys
This is not perfect and you need to be careful if you are doing agent forwarding into the box but for most general cases this works.
Sunday, February 21, 2010
Good and bad: NUFC promotion to Premiership
Over the past year my Newcastle United RSS feed has been very different to the year before. We've only lost 4 times away and the results have often been 3-0, 4-1, etc Its been a joy to read the news. However, that good feeling that I get on a Monday morning is going to change with promotion back into the Premiership. Honestly, I've got mixed feelings about promotion now that I am so used to hearing good news and the most we can seem to hope for is solid middle table performance in the Premier League.
Tuesday, December 01, 2009
Book Review - Wicket In Action by Manning
I just finished reading Wicket in Action by Manning. The book is well laid out; I particularly liked how the simple example web site (cheesr) is grown through the book in line with the topics of the chapter. In addition to the stuff that you expect (such as how to work with/customize components, models, etc) there is also good coverage of important topics like I18N, testing, integration with frameworks like hibernate and spring, and integration with JavaScript engines (other than the wicket JavaScript engine).
Regarding Wicket itself: I really like this framework (we use it in my current team and it produces nice UIs that are pretty easy to maintain and change). Why do I like Wicket? Well for the following reasons:
BTW...I love manning books. The fact that you can get a free ebook when you purchase the print copy is excellent and in general everything I read from the publisher is superb.
Regarding Wicket itself: I really like this framework (we use it in my current team and it produces nice UIs that are pretty easy to maintain and change). Why do I like Wicket? Well for the following reasons:
- Its Java based, and since I'm strongest in Java it suites me.
- There is a nice separation between the UI in HTML/CSS/JS and the java code that backs it. This clear separation between the presentation/design aspect and the coding is useful because it separates along the common skill groups. JavaFX (can you say "designer/developer workflow") may change my opinion on this but right now I see advantages over say the JSP approach.
- Testing is well covered (with WicketTester) over and above just using something like Selenium (also mentioned in the book).
- The AJAX support seems solid and flexible. It even leaves you open to using other 3p JavaScript frameworks for your fancy UI components. In particular there is lots of support for request/response queues and falling back to full page refreshing that is particularly attractive.
BTW...I love manning books. The fact that you can get a free ebook when you purchase the print copy is excellent and in general everything I read from the publisher is superb.
Checkpoint VPN Client Tray Icon Disappears...how to get back
One of the issues I sometimes have is that the Checkpoint VPN Client Tray Icon (the yellow key) sometimes disappears from my windows xp start bar. If I try to re-run checkpoint it says its already started. I just found out how to get it back without restarting my laptop - kill the SR_GUI.exe process. Things will automatically restart and the tray icon re-appears again.
BTW...I hate Checkpoint VPN...compared to Cisco VPN its an unstable horrific piece of software. I guess you get what you pay for.
BTW...I hate Checkpoint VPN...compared to Cisco VPN its an unstable horrific piece of software. I guess you get what you pay for.
Thursday, October 15, 2009
nxserver incompatibility with jboss ports
nxserver (see www.nomachine.com) is an amazing replacement for vncserver. I REALLY like it. It comes with great support like:
The solution that in the end worked best for me was to just shutdown the nxserver sessions and running JBoss first. Then while JBoss is running start up the nxserver and connect. nxserver will skip over the ports being used by JBoss and use unused ports further up the 7000 range.
NOTE: Even if nxserver is stopped there can be persistent sessions. You can check by running:
- clipboard copy/paste between your machine and the remote host that always works
- screen resolutions that adapt to your machine with zero headaches
- many more things...check it out!
The solution that in the end worked best for me was to just shutdown the nxserver sessions and running JBoss first. Then while JBoss is running start up the nxserver and connect. nxserver will skip over the ports being used by JBoss and use unused ports further up the 7000 range.
NOTE: Even if nxserver is stopped there can be persistent sessions. You can check by running:
- ps -ef | grep nxagent
- netstat -ap | grep nxagent
Subscribe to:
Posts (Atom)
