Thursday, June 4, 2009

Shell commands to count field separators

So I suspected a few pipe-separated csv files where truncated inappropriately.

How could I count the number of fields of the last 2 rows?

tail -1 # will give you the last record
tail -2 | head -1 # will give you the 2nd to last record
tr -d -c '|' # deletes everything except pipe chars. -d is delete, -c is the 'complement' or inverse set of the list of chars I supply, which is only a single '|'

wc # of course, provides counts

The final commands:

tail -2 somefile.txt | head -1 | tr -d -c '|' | wc
tail -1 somefile.txt | tr -d -c '|' | wc

And I did find that the files were truncated

Wednesday, May 27, 2009

Python module of the week

Someone has been doing a tutorial of a python module per week for quite some time

Tuesday, May 26, 2009

Good list of vim tips

On any good list of vim tips, there's always a lot of stuff that I don't know.

Blogging Ottinger: Use vim like a pro

Sunday, May 10, 2009

Thursday, May 7, 2009

Find non-ascii chars in mysql

How can you tell whether a non-ascii character is sneaking into a field that should not contain any such characters?

Cast the field to ascii and see if it's any different:
select item_id, uri from items where convert(uri using ascii) <> uri;

Of course, you can/should preclude such a possibility by changing the character set of the field to ascii.

Tuesday, March 31, 2009

Alternative to echo

There is an alternative to using echo to pipe to stdin of another command, using the heretofore unknown-to-me '<<<' operator.

echo 'hello' | nl

is equivalent to

nl <<< 'hello'

I'm not sure that the latter is more useful. It looks more elegant if the stream of commands is long for them to be in a consistent left-to-right order, e.g.

echo 'hello' | nl | rev | tee /dev/stderr

looks a lot better to me than

nl <<< 'hello' | rev | tee /dev/stderr

Duplicate output to stderr

You can direct output to stdout to stderr by using tee to pipe it to /dev/stderr. Like so:

echo 'hello' | tee /dev/stderr

will output 'hello' twice, one in stdout and one in stderr

Wednesday, March 25, 2009

Read multiple newline types in csv

One really convenient module I use all the time is the csv module. You would think that parsing csv by yourself is easy, i.e., if you've never tried it, but it's really rather tricky.

People who use excel (sneer) often need to send me csv-formatted spreadsheets, and quite often the formatting is quite messed up. Often the line-endings are bizarre and frightening, and unrecognizable.

The solution is to add a 'U' flag to the open() call to enable universal newline support.

For instance, if you're opening a file for reading:

open('somefile.csv', 'rU')

will not care what the newline format is. Somehow it manages it all without me having to think about it. My thoughts are expensive.

Cool way to tell if two result sets are the same

You can use the pager setting in the mysql client to specify, that's right a pager.


I normally have mine set to less -FX. You can also do more interesting things, like set it to grep or grep -v or sed or anything else like that.

One cool use is to set it to md5sum - . That way you can compare two result sets to see if they are identical. I formerly would put result sets into temporary tables then run checksum table on the temporary table, but this way requires several fewer steps than that.

Tuesday, March 24, 2009

Monday, March 16, 2009


Wednesday, March 11, 2009

edit a command in vim

If you're working with a really long command, you can edit it in vim (or whatever is specified in $EDITOR) by hitting cntrl-x + cntrl-e.

This only works in bash.

It would be nice if the same thing existed for readline, so it could be used in the mysql client. It would also be cool if I could do something like this over multiple lines in the python shell.

delete a specific line

You can delete a specific line from any file or stream using sed.

sed '10d' will delete line 10 of a file or stream.

Using sed on a file, the -i flag will replace the contents of the file in-place. If you specify an extension after the -i flag, it will back up the file to a copy with that extension.

I used this recently when attempting to build a db dump from wikipedia. The sql dumps from mysql version 4.1.x have a line like '-----------' on line 4 of the dump. My version of mysql apparently requires a newer sql spec which requires a space after the first two dashes in order to be a legitimate comment, so it was failing to parse the dump.

The complete command I used, including using pv to add a progress bar:

pv enwiki-20090306-redirect.sql.gz | gunzip -c - | sed '4d' | mysql wikipedia

Monday, March 9, 2009

some ping fun

The -a flag will tell ping to print a bell character whenever a ping comes back.

This is useful if you want to check when a server comes back online, and beep at you when that is the case. You can also use the -i flag to check ever 30 seconds:

$ ping -a -i 30 some_host

On OS X, you can make this a little more interesting (though there are equivalents on other platforms). ping will return successfully if any packets were returned. Using the && operator we can have something else fire off if the pinging works. The -o flag tells ping to exit when it first gets a successful ping reply.

$ ping -o -i 30 some_host && say 'the host has come back online'

Incidentally I used to have the make command aliased like so:

$ make && say -v 'good news' make complete || say -v 'bad news' make complete

This way, after a long make compile run, my computer would say "make complete" in a happy voice if the build was successful or a sad voice if the build failed.

two ways to create 10 text files

The {} operator will be expanded out by the shell,

$ echo HELLO{1,2,3}
HELLO1 HELLO2 HELLO3

You can use the .. operator to expand sequences, like so:
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10

If you prefix some zeroes in there, it will zerofill:
$ echo {01..10}
01 02 03 04 05 06 07 08 09 10
$ echo {01..10}.txt
01.txt 02.txt 03.txt 04.txt 05.txt 06.txt 07.txt 08.txt 09.txt 10.txt

So the command to create 10 empty text files is:

$ touch {01..10}.txt

Another way to do it is to use the seq command (which does not appear to be bundled with BSD OS's like OS X).

seq prints out numbers in a sequence.

$ seq 1 3
1
2
3

seq accepts many options. One of those is the -f flag, which will take a string to format the number according to C printf() format, like so:

$ seq -f "%02g.txt" 1 3
01.txt
02.txt
03.txt

It's fairly straightforward to pipe this output through xargs to turn it into a list of arguments for touch:

$ seq -f "%02g.txt" 1 10 | xargs touch

Monday, February 23, 2009

Fun application of tr

Lately youtube has been shutting down utilities that let you download videos easily.

So, what I've been doing is using the LiveHTTPHeaders plugin to Firefox to just copy out whatever .flv file it's requesting. However, this always copies out an extra line break which messes up the shell.

To filter out line breaks from the clipboard (OS X only):

pbpaste | tr -d '\n' | pbcopy

Friday, February 20, 2009

Wednesday, February 18, 2009

XOR swap algorithm

There is a way to swap two variables without needing a swap variable. It's called the XOR swap algorithm.

X := X XOR Y
Y := X XOR Y
X := X XOR Y

This allows you to swap two columns in a table if you can't alter the table to add a column temporarily to swap them. I'm not sure how mysql handles user variables in update statements.

Shell braces

most shells with expand braces with commas, like so:

$ echo file{1,2,a}
file1 file2 filea

You can even nest them:

$ echo file{1,2,3{1,2}}
file1 file2 file31 file32

There are several cool uses for this trick.

The first to copy a file to itself plus .bak:

cp /path/to/somefile{,.bak}

You can also combine this with backtick to version a file with a date:

cp /path/to/somefile{,-`date -I`}

Tuesday, February 17, 2009

ssh-copy-id

If you use ssh keys to achieve a higher level of security (or more likely, achieve passwordless logins), you are used to having to copy your public key from one server to another, very likely using some completely insecure method.

use instead ssh-copy-id, which will prompt you to log in to the remote machine one last time using your existing login and will copy your public key into the correct file securely using ssh.