Thursday, June 4, 2009

Shell commands to count field separators

So I suspected a few pipe-separated csv files where truncated inappropriately.

How could I count the number of fields of the last 2 rows?

tail -1 # will give you the last record
tail -2 | head -1 # will give you the 2nd to last record
tr -d -c '|' # deletes everything except pipe chars. -d is delete, -c is the 'complement' or inverse set of the list of chars I supply, which is only a single '|'

wc # of course, provides counts

The final commands:

tail -2 somefile.txt | head -1 | tr -d -c '|' | wc
tail -1 somefile.txt | tr -d -c '|' | wc

And I did find that the files were truncated

Wednesday, May 27, 2009

Python module of the week

Someone has been doing a tutorial of a python module per week for quite some time

Tuesday, May 26, 2009

Good list of vim tips

On any good list of vim tips, there's always a lot of stuff that I don't know.

Blogging Ottinger: Use vim like a pro

Sunday, May 10, 2009

Thursday, May 7, 2009

Find non-ascii chars in mysql

How can you tell whether a non-ascii character is sneaking into a field that should not contain any such characters?

Cast the field to ascii and see if it's any different:
select item_id, uri from items where convert(uri using ascii) <> uri;

Of course, you can/should preclude such a possibility by changing the character set of the field to ascii.

Tuesday, March 31, 2009

Alternative to echo

There is an alternative to using echo to pipe to stdin of another command, using the heretofore unknown-to-me '<<<' operator.

echo 'hello' | nl

is equivalent to

nl <<< 'hello'

I'm not sure that the latter is more useful. It looks more elegant if the stream of commands is long for them to be in a consistent left-to-right order, e.g.

echo 'hello' | nl | rev | tee /dev/stderr

looks a lot better to me than

nl <<< 'hello' | rev | tee /dev/stderr

Duplicate output to stderr

You can direct output to stdout to stderr by using tee to pipe it to /dev/stderr. Like so:

echo 'hello' | tee /dev/stderr

will output 'hello' twice, one in stdout and one in stderr

Wednesday, March 25, 2009

Read multiple newline types in csv

One really convenient module I use all the time is the csv module. You would think that parsing csv by yourself is easy, i.e., if you've never tried it, but it's really rather tricky.

People who use excel (sneer) often need to send me csv-formatted spreadsheets, and quite often the formatting is quite messed up. Often the line-endings are bizarre and frightening, and unrecognizable.

The solution is to add a 'U' flag to the open() call to enable universal newline support.

For instance, if you're opening a file for reading:

open('somefile.csv', 'rU')

will not care what the newline format is. Somehow it manages it all without me having to think about it. My thoughts are expensive.

Cool way to tell if two result sets are the same

You can use the pager setting in the mysql client to specify, that's right a pager.


I normally have mine set to less -FX. You can also do more interesting things, like set it to grep or grep -v or sed or anything else like that.

One cool use is to set it to md5sum - . That way you can compare two result sets to see if they are identical. I formerly would put result sets into temporary tables then run checksum table on the temporary table, but this way requires several fewer steps than that.

Tuesday, March 24, 2009

Monday, March 16, 2009


Wednesday, March 11, 2009

edit a command in vim

If you're working with a really long command, you can edit it in vim (or whatever is specified in $EDITOR) by hitting cntrl-x + cntrl-e.

This only works in bash.

It would be nice if the same thing existed for readline, so it could be used in the mysql client. It would also be cool if I could do something like this over multiple lines in the python shell.

delete a specific line

You can delete a specific line from any file or stream using sed.

sed '10d' will delete line 10 of a file or stream.

Using sed on a file, the -i flag will replace the contents of the file in-place. If you specify an extension after the -i flag, it will back up the file to a copy with that extension.

I used this recently when attempting to build a db dump from wikipedia. The sql dumps from mysql version 4.1.x have a line like '-----------' on line 4 of the dump. My version of mysql apparently requires a newer sql spec which requires a space after the first two dashes in order to be a legitimate comment, so it was failing to parse the dump.

The complete command I used, including using pv to add a progress bar:

pv enwiki-20090306-redirect.sql.gz | gunzip -c - | sed '4d' | mysql wikipedia

Monday, March 9, 2009

some ping fun

The -a flag will tell ping to print a bell character whenever a ping comes back.

This is useful if you want to check when a server comes back online, and beep at you when that is the case. You can also use the -i flag to check ever 30 seconds:

$ ping -a -i 30 some_host

On OS X, you can make this a little more interesting (though there are equivalents on other platforms). ping will return successfully if any packets were returned. Using the && operator we can have something else fire off if the pinging works. The -o flag tells ping to exit when it first gets a successful ping reply.

$ ping -o -i 30 some_host && say 'the host has come back online'

Incidentally I used to have the make command aliased like so:

$ make && say -v 'good news' make complete || say -v 'bad news' make complete

This way, after a long make compile run, my computer would say "make complete" in a happy voice if the build was successful or a sad voice if the build failed.

two ways to create 10 text files

The {} operator will be expanded out by the shell,

$ echo HELLO{1,2,3}
HELLO1 HELLO2 HELLO3

You can use the .. operator to expand sequences, like so:
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10

If you prefix some zeroes in there, it will zerofill:
$ echo {01..10}
01 02 03 04 05 06 07 08 09 10
$ echo {01..10}.txt
01.txt 02.txt 03.txt 04.txt 05.txt 06.txt 07.txt 08.txt 09.txt 10.txt

So the command to create 10 empty text files is:

$ touch {01..10}.txt

Another way to do it is to use the seq command (which does not appear to be bundled with BSD OS's like OS X).

seq prints out numbers in a sequence.

$ seq 1 3
1
2
3

seq accepts many options. One of those is the -f flag, which will take a string to format the number according to C printf() format, like so:

$ seq -f "%02g.txt" 1 3
01.txt
02.txt
03.txt

It's fairly straightforward to pipe this output through xargs to turn it into a list of arguments for touch:

$ seq -f "%02g.txt" 1 10 | xargs touch

Monday, February 23, 2009

Fun application of tr

Lately youtube has been shutting down utilities that let you download videos easily.

So, what I've been doing is using the LiveHTTPHeaders plugin to Firefox to just copy out whatever .flv file it's requesting. However, this always copies out an extra line break which messes up the shell.

To filter out line breaks from the clipboard (OS X only):

pbpaste | tr -d '\n' | pbcopy

Friday, February 20, 2009

Wednesday, February 18, 2009

XOR swap algorithm

There is a way to swap two variables without needing a swap variable. It's called the XOR swap algorithm.

X := X XOR Y
Y := X XOR Y
X := X XOR Y

This allows you to swap two columns in a table if you can't alter the table to add a column temporarily to swap them. I'm not sure how mysql handles user variables in update statements.

Shell braces

most shells with expand braces with commas, like so:

$ echo file{1,2,a}
file1 file2 filea

You can even nest them:

$ echo file{1,2,3{1,2}}
file1 file2 file31 file32

There are several cool uses for this trick.

The first to copy a file to itself plus .bak:

cp /path/to/somefile{,.bak}

You can also combine this with backtick to version a file with a date:

cp /path/to/somefile{,-`date -I`}

Tuesday, February 17, 2009

ssh-copy-id

If you use ssh keys to achieve a higher level of security (or more likely, achieve passwordless logins), you are used to having to copy your public key from one server to another, very likely using some completely insecure method.

use instead ssh-copy-id, which will prompt you to log in to the remote machine one last time using your existing login and will copy your public key into the correct file securely using ssh.

bulk rename utility

There is a bulk rename utility which is installed by default in many distributions calls 'rename'. It expects a perl regular expression to describe the substitution.

You can use this to, say, change the file extension of several files:

rename 's/\.htm$/.html/' *.html

It's written in perl and was originally written by Larry Wall.

Monday, February 16, 2009

Disposable variables

Ever have a bunch of variables you get returned in a tuple, and you only want some of them? So you assign them to some useless variable like 'nothing' or 'deleteme' and hope that the optimizer will instantly garbage collect them away?

(user, user_type, ingorethis) = somefunction()

The correct way to do this is to use the _ variable. (underscore). Underscore is equivalent to perl's $_ variable: it contains the results of the last evaluated expression. This is sometimes useful if you're using the python shell as a calculator. So, it will instantly be overwritten.

(user, user_type, _) = somefunction()

Thursday, February 5, 2009

Do a mysqldump on a limited number of records

To limit a mysqldump to dump only N records:

mysqldump --where 'true limit N'

Tuesday, January 20, 2009

Change we can believe in

I've finally found the words to articulate how I feel right now.

I feel like that great air-breathing fish, Admiral Ackbar; as he leans back and sighs with relief and awe, the other fish-men in the rebel corsair cheer as the disabled super star destroyer crashes into the death star.

Friday, January 16, 2009

Why PHP won

I've run out of patience with PHP, but I've always recognized the coexistence of its strengths and its weaknesses that narrower minds refused to acknowledge. Some CTO has a long post essentially arguing this.

I first learned PHP in January 2000. I remember the date exactly. This was only shortly after 4.0 had just come out. I had some acquaintances who were using PHP 3 but I had never tried it.

For awhile I loved PHP. Being used to the fixed-memory limitations of the data structures of C/C++ and mysterious segfaults I often couldn't understand, I found PHP to be incredibly liberating.

Wednesday, January 14, 2009

stupid mysql tricks


what do you know, screencasting works!

Monday, January 5, 2009

The new way to do show processlist

We all know and love the show processlist command to see what our queries are doing.

However, most of the show commands are being deprecated in favor of new information_schema virtual tables. (I believe the technical term is actually "system view".)

The new information_schema equivalent of show processlist is:

select * from information_schema.processlist \G

*************************** 1. row ***************************
     ID: 259584
   USER: readonly
   HOST: 192.168.0.186:44761
     DB: bln
COMMAND: Query
   TIME: 222
  STATE: Writing to net
   INFO: SELECT /*!40001 SQL_NO_CACHE */ * FROM `person5`
*************************** 2. row ***************************
     ID: 259468
   USER: readonly
   HOST: 192.168.0.108:43492
     DB: bln
COMMAND: Query
   TIME: 0
  STATE: executing
   INFO: select * from information_schema.processlist

The ID column appears to be the connection_id of the thread that is running the query.

The advantage of this method is that we can filter the results with a where clause, like so:

select * from information_schema.processlist where command <> 'Sleep' and id <> connection_id();

which will show only those queries that are not in sleep mode and excludes the current process.