Tuesday, 25 November 2014

Linux: awk - Group By Count File Data

While its pretty easy to do 'Group By' at database level, 'awk' enables us to do same at file level.

Consider a file as below:

# cat test.csv
anita 111
rama 555
david 555
raj 111
shaik 222
naren 222


Here, I want to know how many names in the 1st column are having same values in the second column. 
That's nothing but a simple Group By in MySQL or any other database. 
So, load to a table and execute the query.

But, if the file having crores of data, MySQL query would be a costly operation.

In this time, 'awk' command will come for your help.

Below is the command:

# awk  '{arr[$2]++;}END{for(i in arr)print i, arr[i] ;}' test.csv
555 2
111 2
222 2


Here, awk is fetching the second column into an array and counting the appearance.

If, the file is comma separated(CSV), use the below command:



# awk  -F, '{arr[$2]++;}END{for(i in arr)print i, arr[i] ;}' test.csv
555 2
111 2
222 2


No comments:

Post a Comment

Note: only a member of this blog may post a comment.