Intro & Motivation¶
I was using the column
command to columnate data similar to this:
ID,NAME,POWER,SHIELD
1,Yoda,100,100
2,,,80
3,Aayla,97,58
4,,70,
5,Ahsoka,75,88
sample data
Of special importance for this example are the empty fields.
GNU column
command worked as I expected:
$ column -ts, ./data.csv
ID NAME POWER SHIELD
1 Yoda 100 100
2 80
3 Aayla 97 58
4 70
5 Ahsoka 75 88
GNU column output
80 was correctly placed on the SHIELD column for ID 2, and 70 was correctly placed on column POWER for ID 4.
But macOS column
has a simpler implementation, and it ignored the empty fields, placing data in the wrong columns:
$ column -ts, ./data.csv
ID NAME POWER SHIELD
1 Yoda 100 100
2 80
3 Aayla 97 58
4 70
5 Ahsoka 75 88
macOS column output
Observe that 80 for ID 2, and 70 for ID 4 are all in the incorrect NAME column.
Sed to the rescue¶
One possible solution would be to first use fill in those empty fields with an empty space (or some other character or characters that would not mess up with our data) so that column
on macOS will not consider them to be empty fields.
Let’s do it with sed
.
First, let’s try with a sample input:
$ echo '2,,,80' | sed 's/,,/, ,/g'
2, ,,80
It doesn’t add a space on the second pair of “,,”. What gives? The problem is that sed matches the first pair of “,,” and moves on. Visually, we can represent what happened with something like this:
[],,, => [,,], => (, ,), => , ,[,] => end
So even with the g
modifier for the replacement, it ony matches the first pair of “,,”.
One solution with sed is to use a label and jump until no more matches are possible:
$ echo '2,,,80' | sed ':b; s/,,/, ,/; t b;'
2, , ,80
sed label and jump to loop over matches
Or this other solution, courtesy of Earnestly on the #sed libera IRC channel:
$ echo '2,,,80' | sed 's/,\(,[^,]\)\{0,\}/, \1/g'
2, , ,80
In either case, we now have a sed command to fill the empty fields. Test it with the data file itself:
$ sed ':b; s/,,/, ,/; t b;' ./data.csv
ID,NAME,POWER,SHIELD
1,Yoda,100,100
2, , ,80
3,Aayla,97,58
4, ,70,
5,Ahsoka,75,88
Yeah, now we have spaces between “, ,” where we originally had “,,”.
Final, working solution¶
Finally, let’s pass that output to column
and tell it to use the comma ,
as the separator for -t
:
$ sed ':b; s/,,/, ,/; t b;' ./data.csv | column -t -s,
ID NAME POWER SHIELD
1 Yoda 100 100
2 80
3 Aayla 97 58
4 70
5 Ahsoka 75 88
This works both on macOS sed and GNU sed.