While its pretty easy to do 'Group By' at database level, 'awk' enables us to do same at file level.
Consider a file as below:
# cat test.csv
anita 111
rama 555
david 555
raj 111
shaik 222
naren 222
Consider a file as below:
# cat test.csv
anita 111
rama 555
david 555
raj 111
shaik 222
naren 222
Here, I want to know how many names in the 1st column are having same values in the second column.
That's nothing but a simple Group By in MySQL or any other database.
So, load to a table and execute the query.
But, if the file having crores of data, MySQL query would be a costly operation.
In this time, 'awk' command will come for your help.
Below is the command:
# awk '{arr[$2]++;}END{for(i in arr)print i, arr[i] ;}' test.csv
555 2
111 2
222 2
Here, awk is fetching the second column into an array and counting the appearance.
If, the file is comma separated(CSV), use the below command:
# awk -F, '{arr[$2]++;}END{for(i in arr)print i, arr[i] ;}' test.csv
555 2
111 2
222 2