Wednesday, April 14, 2010

Bulk COPY a CSV into PostgreSQL, skipping first row

Looked for a solution to this yesterday and couldn't find it. Asked my Linux guru Jeremy today and he had a easy solution, so this might be helpful to others.

The scenario is you have a big CSV file, and you want to bulk copy it into PostgreSQL, but the first row of the file isn't data, it's got the column names in it. In my case, the text file is 65 Megs so it's not like you can just edit it in a text editor to delete the offending line. (The data happens to be the combined US and Canada zip/postal code database from ZipInfo.com, fyi.)

SQL Server has a bulk insert GUI that lets you specify a start row. Needed that functionality here.

Solution:

Use wc to find out how many rows are in your file:

$ wc ZCUG.TXT
872135 1871133 69105493 ZCUG.TXT


That first number returned, in my case 872135, is the number of rows in the file. Subtract one and and tail that number, outputting to a new file:

tail -872134 ZCUG.TXT > ZCUG-trimmed.txt

Boom! A new file without the row of column names.