Solr Cloud 4.7.0, Tomcat 7 & Zookeeper 3.4.5 in Vagrant

I’ve been attempting to set up Solr Cloud locally in Vagrant for several days with various entries around and the Solr Cloud tutorial without much luck until now.  I would like to share my experience with setting Solr Cloud up and continue to build on this posting with problems I come across.

Goals and assumptions for this tutorial are:

Solr 4.7.0 – this’s the latest version of Solr at the time.
Tomcat7 – Solr comes with Jetty.
ZooKeeper 3.4.5 – Solr has ZooKeeper build in but it’s not recommended to use it in production.
Ubuntu LTS 12.04
Vagrant 1.4.1
Virtualbox 4.3.8r92456

Assuming you have Virtualbox and Vagrant installed clone my git repository to get started.

git clone git@github.com:thehar/thehar-vagrant.git

Type: `vagrant up` and let Vagrant do the magic of bringing up your host.  After Vagrant has completed type `vagrant ssh` and you should have a working VM to start your work in.

Now that we are in your VM lets start by installing Tomcat 7. Next, lets grab the source for Solr and Zookeeper.

sudo apt-get install tomcat7 tomcat7-admin
wget http://apache.mirrors.timporter.net/zookeeper/current/zookeeper-3.4.5.tar.gz
wget http://mirror.cogentco.com/pub/apache/lucene/solr/4.7.0/solr-4.7.0.tgz

Unpack downloaded software.

tar zxfv zookeeper-3.4.5.tgz
tar zxfv solr-4.7.0.tgz

Create a directory for zookeeper data and set configuration to point to that place.

sudo mkdir -p /opt/zookeeper/data
cd zookeeper-3.4.5/
cp conf/zoo_sample.cfg conf/zoo.cfg
vim conf/zoo.cfg

Change the dataDir and paste appropriate path: /opt/zookeeper/data.

sudo ./bin/zkServer.sh start

Now lets create two Solr shards. Go to Solr directory and have a look into example. Lets replicate example into two directories.

cd solr-4.7.0/example/
cp -a example / /opt/solr1
cp -a example / /opt/solr2

Next, start the first Solr instance.  This loads the solr config to all zookeeper data directories.

cd /opt/solr1
java -Dbootstrap_confdir=./solr/collection1/conf/ -Dcollection.configName=collection1 -DnumShards=2 -DzkHost=localhost:2181 -jar start.jar
# Background the process: Ctrl+z and bg
cd /opt/solr2
java -Djetty.port=7574 -DzkHost=localhost:2181 -jar start.jar

You can view your empty Solr Cloud at http://localhost:8983/solr/#/~cloud

GPG is annoying

When gpg-agent gets you down:

thehar$ gpg -d derp.txt.gpg

You need a passphrase to unlock the secret key for
user: “Herpy Derp
4096-bit RSA key, ID DEADBEAF, created 2077-01-02

gpg: problem with the agent: No pinentry
gpg: encrypted with 4096-bit RSA key, ID DEADBEAF, created 2077-01-02
“Herpy Derp
gpg: public key decryption failed: Operation cancelled
gpg: decryption failed: No secret key


ps aux | grep gpg
thehar 45676 0.0 0.0 2432768 608 s002 R+ 2:33PM 0:00.00 grep gpg
thehar 45626 0.0 0.0 603400 644 ?? Ss 2:22PM 0:00.02 gpg-agent --daemon --use-standard-socket


kill -9 gpg-agent

TADA!

chef-knife and knife.rb

I find myself needing to do this frequently so lets post a couple common code blocks of a useful ~/chef/knife.rb and a ~/.bashrc profile in case others need it as much as I need to reference as well.

knife.rb

current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
user = ENV['OPSCODE_USER'] || ENV['USER']
node_name user
client_key "#{current_dir}/username.pem"
validation_client_name "username-validator"
validation_key "#{current_dir}/username-validator.pem"
chef_server_url "https://api.opscode.com/organizations/organizationname"
cache_type 'BasicFile'
cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )
raise "You must set the ORGNAME environment variable" if ENV['ORGNAME'].nil?
cookbook_path ["~/chef-repo/cookbooks"]
role_path ["~/chef-repo/roles"]

# OpenStack
knife[:openstack_username] = ENV['OS_USERNAME']
knife[:openstack_password] = ENV['OS_PASSWORD']
knife[:openstack_auth_url] = ENV['OS_AUTH_URL']
knife[:openstack_tenant] = ENV['OS_TENANT']
knife[:identity_file] = ENV['OS_IDENTITY_FILE']

~/.bashrc

OPSCODE_USER=username
ORGNAME=orgname
COOKBOOK_COPYRIGHT='Beyond'
COOKBOOK_EMAIL='email_of_destiny@euphamism.com'
PATH=$PATH:$HOME/.rvm/bin# Add RVM to PATH for scripting

Credit: Eric G. Wolfe

No screen? Disown your process!

Something many people have to realize is that they are not using screen during long things like rsyncing 1 TB+ of data.  Those of you that have yet to enjoy something so lovely, you should always run it in a screen session or in tmux, etc.  If for some reason you are running a long command in a terminal and need to disconnect here’s a helpful hint using disown. While a command is running, hit `Ctrl+z` to suspend the foreground job and use `bg` to put it in the background. Now you can use disown %n. You can find out what n is by running `jobs`. Run `jobs` after you disown again to make sure the job is now being controlled by init or a simple ps aug | grep process.

Ubuntu limit nofile

Ran into an interesting problem this week with file descriptors and TCP sessions on a server in Ubuntu. Using God to monitor a process, I had god starting a process with:

w.start = "ulimit -c unlimited ; ulimit -n 500000 ; /usr/sbin/process /etc/process.conf

Built a new instance in AWS, bootstrapped the machine with Chef, attached the role to the machine, let chef-client do it’s thing and allow connections to it from devices. Next, god starts up and file descriptors aren’t changing correctly for that process and there are hundreds of thousands of connections are attempting to come up.

By default, processes spawned in Ubuntu had a limit of 1024 open files, but changing that limit via ulimit or by /etc/security/limits.conf had no effect. I discovered that changes in /etc/security/limits.conf (and subsequently /etc/pam.d/common-session ) are only applied to process spawned by a login shell. Programs that launch on startup via “upstart” do not get these limits applied to them. Thankfully, upstart provides the limit stanza that allows you to modify some of these parameters, including the maximum number of open files. To see the limits on process, grab its PID, and cat /proc/<>/limits

This will fix your woes:

limit nofile 100000 110000

Credit: http://bryanmarty.com/blog/2012/02/10/setting-nofile-limit-upstart/

Git stash – snapshot

I’m constantly learning new things about git and I ran across this lovely piece of love this morning:

Take a snapshot of your current working tree without removing the changes from your tree. This is handy for refactoring where you can’t quite fit what you’ve done into a commit.

Add this into ~/.gitconfig under aliases:

snapshot = !git stash save "snapshot: $(date)" && git stash apply "stash@{0}"

Running this:

$ git snapshot

Creates this stash:

stash@{0}: On feature/handy-git-tricks: snapshot: Mon Apr 8 12:39:06 BST 2013

And seemingly no changes to your working tree.

Credit: http://blog.apiaxle.com/post/handy-git-tips-to-stop-you-getting-fired/

Git A and Git B

Recently I ran into a problem.  I had two git repositories for the same project.  Both git repositories were diverged and had individual work and git trees.  Quick and dirty way to get things from git B to git A and do a giant squashed commit to get it back into git A.

rsync -a -f”- .git/” -f”+ *” source/ dest/

This will copy everything BUT .git directories.

Git commit away!

Fix? Mac OS X Lion, .local domain hell

Ran across this blog entry that showed a “fix” for OS X Lion and how it will consider Multicast DNS first before /etc/hosts lookup.  At my employer we use .local domain within our corporate environment and this causes all sorts of pain with Capistrano deployments, development environments, etc.  All .local domains are technically really only reserved for Multicast DNS but Lion doesn’t trust /etc/hosts first and insteads makes you wait up to 5 seconds before looking at your entries.

Begging IT to change our FQDN on our network takes time, so in the meantime there is a dscl command you can use to help.

sudo dscl localhost -create /Local/Default/Hosts/mydev.local IPAddress 127.0.0.1

To see all the currently defined hosts and their IPs

sudo dscl localhost -list /Local/Default/Hosts IPAddress

And to remove a host:

sudo dscl localhost -delete /Local/Default/Hosts/mydev.local

Thanks to http://itand.me/ for this information.