Wednesday, March 27, 2013

Solaris redundant nfs mounts

Stardate 90840.8 Nfs servers with lots of disks, sharing files out to multiple clients, is a pattern followed everywhere. Eventually your users begin to not only use these remote files, but start to put nfs directories into their PATH variable. This causes problems whenever you need to patch or reboot the nfs server because all shells launched by users with PATHs that look into the nfs directories will hang forever(this does assume you are mounting hard).

You can minimize the effect of this by using redundant mount information.


/usr/local -ro,hard,intr,suid
/usr/local -ro,hard,intr,suid,
It is best to maintain an idea of a primary and secondary, at least for administration. Modify only the primary, and rsync to the secondary. Use readonly mounting to mount. This mount appears in ``mount`` like this:
/usr/local on, remote/read only/setuid/devices/rstchown/hard/intr/xattr/dev=5a0000e on Wed Mar 27 15:35:21 2013
Note that this is mounted on both servers. Packets get sent to both servers and the first to respond with valid information is reported to the system. This can make for some bizarre weirdness if you use readwrite mounts.

It is totally possible to use something like drbd between nfs servers (not on Solaris, obviously), to make this doable with readwrite mounts. I have not done this personally.

Monday, March 18, 2013

Cascadia IT Conf

Stardate 90814.44

This weekend we attended CasItConf13. I had a blast at met a lot of really cool people. I attended presentations on Logstash, IPv6, Chef and more. Jordan Sissel, in particular, did a great job of presenting Logstash. After his talk we met up and had a neat conversation. He showed me an app he had created called fingerpoken. Its a bit out of date and we had to do some hacks but I was able to get it up and running in a half-hour lunch break and still have time to demolish some tasty lunch provided by the wonderful folks over at Puppet Labs. Fingerpoken is an app that lets you send mouse and keyboard events to a computer with a smartphone.

And thats really what its all about. Is the tool simple and easy enough that you can get it going in a crunch? Are all the nonintuitive parts ripped out and replaced with sane defaults and the tool just 'goes'? In fingerpoken's case not really. We had to do some:

sudo ln -s /usr/lib/ /usr/lib/
But, what is the point of having the author of your tool nearby if not to tell you to do that? And yes, the abi is evidently close enough to just work in that case.

I am very impressed that I was able to get such high-level functionality out of a tool in a short period of time and under pressure. If your tool passes the 'setup at lunch at a conference' test, you're doing pretty dang good. If it doesn't, look for places to streamline it. I'm happy to test your random tool, please let me know.

My talk, on the Computer Action Team/Braindump, is available on my github and you can download the pdf from here.

In other news, it seems that github no longer allows you to download the raw files out of repositories if they are above a certain size. Possibly more on that later.

Thursday, March 7, 2013

Debain packaging

Stardate: 90784.6

Git-sync is a script in ruby we use at work for managing git repos. It is covered in an earlier post. I got tired of ensuring it as a file in puppet and decided to make a debian package. Here is the summary of how to make a simple debian package containing just a single file. Note that the answer to this stack overflow question is the source of most of my knowledge, so this will just be annotations and extensions to that.

Debian/Ubuntu packaging (on an ubuntu system) required me to install a single package: devscripts.

At a high level, debian packaging involves creating a 'debian' folder in your source tree and putting several metadatafiles in it. Figuring out the precise contents of these files is the challenge of packaging. I recommend you use the 'apt-get source git' command to get the source of a working package (git in this case) to compare to your own metadatafiles.

Debian/Ubuntu packaging using debuild creates files one level above your current working directory(wtf debian). So the first step is to build a build directory like:

cd ~/devel
mkdir git-sync-build
Procure the source:
nibz@darktemplar:~/devel/git-sync-build$ git clone
nibz@darktemplar:~/devel/git-sync-build$ ls
nibz@darktemplar:~/devel/git-sync-build$ cd git-sync
nibz@darktemplar:~/devel/git-sync-build/git-sync$ mkdir debian

All of the metadata files that debuild, the utility that will actually build the .deb, needs are going to be in the debian directory.

The first file to create is the debian/changelog file. This file is created with the dch utility. Run it from the git-sync directory. It will open vim and it will look like this. Many fields here need to be changed.

dch --create


  * Initial release. (Closes: #XXXXXX)

 -- Spencer Krum   Thu, 07 Mar 2013 01:40:18 -0800
PACKAGE refers to the name of the package. Replace the word PACKAGE with the name you want your package to register itself as. In my git-sync case I will use 'git-sync'. The package name must be lower case. The VERSION must be replaced with a version number. I'm using 1.0.1 for this, since it is the second release of git-sync, but the changes are very minor. There are long articles on the internet about version numbering. It's not my place to comment here. The RELEASE variable needs to be replaced with a debian or ubuntu codename such as 'precise' or 'wheezy'. I have no idea what urgency is, but setting it to low doesn't seem to hurt anything. Maybe this is how you tell apt/dpkg about security updates. The initial release stuff is fine. The name is a bit tricky. Later on we will gpg sign the package. Make sure the name and email in the changelog match exactly the name and email on your gpg key, else the debuild utility won't attempt to have you gpg sign it at all. My changelog looks like this:
git-sync (1.0.0) precise; urgency=low

  * Initial release. 

 -- Spencer Krum   Wed, 06 Mar 2013 16:46:14 -0800

Next create a debian/copyright file:
Upstream-Name: myScript
Upstream-Contact: Name, 

Files: *
Copyright: 2011, Name, 
License: (GPL-2+ | LGPL-2 | GPL-3 | whatever)
 Full text of licence.
 Unless there is a it can be found in /usr/share/common-licenses
I elected for the apache2 license and to use the two paragraph version of that license. I also gave credit where it was due here. Fill out this file as you see fit.

Next create a debain/compat file:

nibz@darktemplar:~/devel/git-sync-build/git-sync/debian$ echo 7 > compat
Next create the rules file. This file seems to be the work-doer in debian packaging. It is evaluated by make, which is picky, so make sure that indented line is a real tab(copying from my blog will probably fail). The --with python is... well I have no idea. I traced it to a python.pem (pem is a perlism) deep within /usr. Since I am packaging a ruby script I just removed it.

Example from stackoverflow

#!/usr/bin/make -f

    dh $@ --with python2
git-sync version
#!/usr/bin/make -f

	dh $@
Next make the control file. Make the natural substitutions here. I guessed on section and it just sorta worked.
nibz@darktemplar:~/devel/git-sync-build/git-sync/debian$ cat control 
Source: git-sync
Section: ruby
Priority: optional
Maintainer: Spencer Krum, 
Build-Depends: debhelper (>= 7),
               ruby (>= 1.8.7)
Standards-Version: 3.9.2
X-Ruby-Version: >= 1.8.7

Package: git-sync
Architecture: all
Section: ruby
Depends: ruby, ${misc:Depends}, ${python:Depends}
Description: Git syncing script, pull based
  Git-sync allows git repositories to be kept in sync via git
  hooks or other means. Pull based, able to handle force pushes
  and submodules
Next make the install file. I went with the default in the stackoverflow post. I attempted to make some simple modifications to it(moving the file to /usr/local/bin) and that made it fail so this file is evidently pretty finicky.
nibz@darktemplar:~/devel/git-sync-build/git-sync$ cat debian/install 
git-sync usr/bin

Now you can build the debian package.

nibz@darktemplar:~/devel/git-sync-build/git-sync$ debuild --no-tgz-check
If all went well, it should ask you to decrypt your gpg key twice and build a package in the directory one level up.
nibz@darktemplar:~/devel/git-sync-build/git-sync$ ls ..
git-sync          git-sync_1.0.0.dsc
git-sync_1.0.0_all.deb  git-sync_1.0.0_amd64.changes  git-sync_1.0.0.tar.gz
You now have a shiny .deb file that can be installed with dpkg -i git-sync_1.0.0_all.deb

It is easy to put this in a launchpad PPA if you have a launchpad account. From your launchpad homepage (a shortcut is if you are signed in). Press the "Create new PPA". Fill out the form.

Next build a source package. Launchpad PPAs take source packages and build binary packages on launchpad servers. Build it with:

nibz@darktemplar:~/devel/git-sync-build/git-sync$ debuild -S
It should go through the gpg motions again and build a source file. Then you should be able to run something like(with your launchpad username and name of PPA):
dput ppa:krum-spencer/git-sync-ppa git-sync_1.0.0_source.changes

Happy Packaging!


Stardate: 90784.4428183

Where I work(read as: play) we use a lot of git. As an operator we often have a service running with its configs in git. A common pattern we use is to have a post-receive hook on the git repository set up to update the git checkout on a remote server. We accomplish this through a post-receive hook that sshes into the remote server and calls a script called git-sync with some options. The git-sync script github project forked from the puppet-sync script project that we use specifically for puppet dynamic git environments. Hunner <3 More dynamic git environments with puppet. Finch <3.

A hook for a project goes in the hooks/post-receive file of the git server's bare repo. Lets look at one now:

Example git post-receive hook

## File: akwardly incorrect

REPONAME=`basename $PWD | sed 's/.git$//'`
SSH_ARGS="-i /shadow/home/git/.ssh/"

while read oldrev newrev refname
  BRANCH=`echo $refname | sed -n 's/^refs\/heads\///p'`
  if [ $BRANCH != "master" ]; then 
    echo "Branch is not master, therefore not pushing to nagios"
    exit 0
  [ "$newrev" -eq 0 ] 2> /dev/null && DELETE='--delete' || DELETE=''
    --branch "$BRANCH" \
    --repository "$REPO" \
    --deploy "$DEPLOY" \


ssh '/etc/init.d/nagios3 reload'

The hook will exit before doing anything if the branch is not

and if it is, will run the git-sync script remotely on the nagios host, then go back in to bounce the nagios service.

The git-sync script essentially performs a

git fectch; git checkout HEAD
It doesn't worry itself with merging, and it is submodule aware.

A file, .git-sync-stamp, must be created by the administrator of the system. This is how git-sync knows it is in charge of managing the repository. It is definitely not recommended that you add this file to git. However, that should more or less work if you never want to think about it. I also wrote this puppet defined type to manage the stamp, initial vcsrepo, and public_key_file for you.

A puppet defined type to initalize gitsync managed folders

define gitsync::gitsync(

  ssh_authorized_key { "${user}-${name}-gitsync":
    user   => $user,
    ensure => $present,
    type   => $public_key_type,

  vcsrepo { $deploy:
    ensure    => present,
    provider  => git,
    user      => $user,
    source    => $source
    require   => Ssh_authorized_key["${user}-${name}-gitsync"],

  file { "${deploy}/.git-sync-stamp":
    ensure  => present,
    owner   => $user,
    mode    => 0644,
    require => Vcsrepo[$deploy],

The last thing to note is that I didn't write git-sync. I've modified it but it was mostly written by Reid Vandewielle and others. Marut <3 Enjoy

Saturday, March 2, 2013

Cisco Out Of Memory

Stardate: 90772.3

Today (well yesterday) our primary router ran out of memory. We haven't fixed the problem yet, I hope that will be the subject of a follow up post, but for right now I want to take you through detection, characterization, and mitigation.

Detection. The way I found out about the problem was via ssh.

Attempting to ssh into the router running out of memory.

> ssh multiplexor.seas
nibz@multiplexor.seas's password:
Permission denied, please try again.
nibz@multiplexor.seas's password:
Connection closed by 2610:10:0:2::210
For anyone familiar with sshing into ciscos this is not how it normally looks. Usually you get three attempts with just 'Password' and one with your user visible.

Attempting to ssh into a router not running out of memory.

> ssh nibz@wopr.seas
nibz@wopr.seas's password:
Connection closed by
I verified that it wasn't a knowing-the-password problem by using another account on the router. I connected a serial port to the router. Immediately found out of memory logs.

Console logs on the router.

10w0d: %AAA-3-ACCT_LOW_MEM_UID_FAIL: AAA unable to create UID for incoming calls due to insufficient processor memory

Logs sent to syslog.

Mar  2 00:43:52 multiplexor 4309463: 10w1d: %SYS-2-MALLOCFAIL: Memory allocation of 128768 bytes failed from 0x1A8C110, alignment 0 
Mar  2 00:44:24 multiplexor 4309499: 10w1d: %SYS-2-MALLOCFAIL: Memory allocation of 128768 bytes failed from 0x1A8C110, alignment 0 
Mar  2 00:47:37 multiplexor 4309643: 10w1d: %SYS-2-MALLOCFAIL: Memory allocation of 395648 bytes failed from 0x1AA03FC, alignment 0 
Mar  2 02:18:33 multiplexor 4313756: 10w1d: %SYS-2-MALLOCFAIL: Memory allocation of 395648 bytes failed from 0x1AA03FC, alignment 
I ran the 'show proc mem' command on the router to get a picture of the memory use of the router.

Show proc mem.

multiplexor#show proc mem
Processor Pool Total:  177300444 Used:  174845504 Free:    2454940
      I/O Pool Total:   16777216 Used:   13261296 Free:    3515920
Driver te Pool Total:    4194304 Used:         40 Free:    4194264
 PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
   0   0  108150192   43169524   58720200          0          0 *Init*
   0   0      12492    2712616      12492          0          0 *Sched*
   0   0  399177972  389135628    8911036   14228691    1490354 *Dead*
   0   0          0          0  102305848          0          0 *MallocLite*
   1   0  973921416  973821100     224768          0          0 Chunk Manager
   2   0        232        232       4160          0          0 Load Meter
   3   0          0          0       7076          0          0 DHCPD Timer
   4   0       4712       6732      11692          0          0 Check heaps
   5   0    7862444   49770056      13540    6270020   28190703 Pool Manager
   6   0          0          0       7160          0          0 DiscardQ Backgro
   7   0        232        232       7160          0          0 Timers
   8   0          0          0       4160          0          0 WATCH_AFS
   9   0        284        728       7160          0          0 License Client N
  10   0 2332421068 2332422156       7168          0          0 Licensing Auto U
  11   0    1482732    1483016       7160          0          0 Image License br
  12   0 2344349400 3601318192     169876     157356          0 ARP Input
  13   0  550320160  550382256       7160          0          0 ARP Background
  14   0          0          0       7168          0          0 CEF MIB API
  15   0          0          0       7160          0          0 AAA_SERVER_DEADT
This shows that the router is indeed running very low on memory. How did we get here? Monitoring + SNMP + RRDtool to the rescue!

Doing some quick estimation on this it looks like it loses about a MB of free ram every 18 hours. RRDtool isn't the best, and getting the big picture graph is hard to do, but basically it has been losing free ram at this rate for a couple of weeks.

Finally we get a show tech-support off of this thing.

multiplexor# show tech-support | redirect tftp://
The redirect to tftp is a really cool pattern for getting information off of a cisco device. The tech-support run was about 50000 lines.

I will do a follow up post when I figure out whats going on.

Update 3-7-13:

The router ran completely out of memory. Even on console all I got was:

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later
It was happily switching and routing at this point, however. We rebooted it because it was Saturday evening and better to have it happen at a time of our choosing than to break iscsi unexpectedly later in the week. Upon reboot, the system returned to full functionality, but we can tell from the zenoss graphs that it is still leaking memory at a rate of 1Mb every 18hrs. At this rate it will need to be rebooted again in 10 weeks. We have opened a case with TAC. I will update again if anything comes from this.

Update 5-17-13:

We still have not fixed the problem. The router can go about 10 weeks before it reboots. This is in an educational setting where there are 12 week terms, meaning we need to reboot our core router a least once a term. Wheeee. We've been on the horn with Cisco who has had numerous techs look at it, and has even replaced the hardware, but the problem remains. Anyone with some ideas is welcome to contact me privately.