Wednesday, 29 January 2014

Linux: awk - Grouping Data in a file

Consider the below file:

# cat test.csv
aa 1 qwer
ab 2 tyui
aa 3 poiu
ab 2 mnb
bb 1 njio
ba 2 njtwe

test.csv  is a tab separated file with 3 columns. 

Here, I want to segregate the whole lines with matching 1st and 2nd columns into separate files. 

Like below:
# cat file_bb_1.csv
bb 1 njio

# cat file_ba_2.csv
ba 2 njtwe

# cat file_ab_2.csv
ab 2 tyui
ab 2 mnb

# cat file_aa_3.csv
aa 3 poiu

# cat file_aa_1.csv
aa 1 qwer


Though you can do this manually, think of a file with more than million lines.

Here, 'awk', being an powerful data manipulation tool,comes to our help. 
Below is the command we can use:


#cat test.csv | awk '{a=$1;b=$2; print $0 >> "file_" a "_" b ".csv"}'


[You can give any name instead of 'file_']

No comments:

Post a Comment

Note: only a member of this blog may post a comment.