Wednesday 29 January 2014

MySQL: SUBSTRING_INDEX - Select Patterns

Consider, a MySQL table having values in a column like below:

SELECT location  FROM geo LIMIT 3;

"location"
"India.Karnataka.Shimoga.Gopala"
"India.Karnataka.Bengaluru.BTM"
"India.Karnataka.Chikmaglore.Koppa"

My requirement is to take only 4th value from each of the rows(such as, Gopala,BTM,Koppa). 
I don't want to display remaining values. 
Its same as what 'cut' command will do in Linux.

For this, we can use SUBSTRING_INDEX function.

SELECT SUBSTRING_INDEX(location,'.',-1)  from geo LIMIT 3;
"location"
"Gopala"
"BTM"
"Koppa"

Syntax: SUBSTRING_INDEX(string,delimiter,count)
Here count means column number based on delimiter. 
Negative value indicates that the column numbers calculated from right side.

So, if I give '-2' instead of  '-1':

SELECT SUBSTRING_INDEX(location,'.',-2)  from geo LIMIT 3;
"location"
"Shimoga.Gopala"
"Bengaluru.BTM"
"Chikmaglore.Koppa"

Linux: awk - Grouping Data in a file

Consider the below file:

# cat test.csv
aa 1 qwer
ab 2 tyui
aa 3 poiu
ab 2 mnb
bb 1 njio
ba 2 njtwe

test.csv  is a tab separated file with 3 columns. 

Here, I want to segregate the whole lines with matching 1st and 2nd columns into separate files. 

Like below:
# cat file_bb_1.csv
bb 1 njio

# cat file_ba_2.csv
ba 2 njtwe

# cat file_ab_2.csv
ab 2 tyui
ab 2 mnb

# cat file_aa_3.csv
aa 3 poiu

# cat file_aa_1.csv
aa 1 qwer


Though you can do this manually, think of a file with more than million lines.

Here, 'awk', being an powerful data manipulation tool,comes to our help. 
Below is the command we can use:


#cat test.csv | awk '{a=$1;b=$2; print $0 >> "file_" a "_" b ".csv"}'


[You can give any name instead of 'file_']

MySQL: Master-Master Circular Replication

While MySQL is one of the simplest RDBMS, it gives vast opportunity to play around!

One of the common question asked in most of the interviews is about Master-Master Circular Replication(means, both servers are master and both are slaves), though its not so practical to be use in a environment where you need high performance.. !!


Here are the brief steps for setting up such a set up..! :

* Install MySQL on both the servers which are our masters

* Configure my.cnf with below options:

server-id = 2 
log_bin = /var/log/mysql/mysql-bin.log
log-slave-updates

[give your convenient values, but make sure server-id is unique]

Note: log-slave-updates-> Normally, a slave does not log to its own binary log any updates that are received from a master server. This option tells the 
slave to log the updates performed by its SQL thread to its own binary log.
This option is must in any chain replication, as well.


* Start MySQL on both the servers and make a note of 'SHOW MASTER STATUS' values.

* Using the above values of each, set the replication parameters on the other 
server. And start slave.

slave stop; 
CHANGE MASTER TO MASTER_HOST = '3.3.3.3', 
MASTER_USER = 'replicator', 
MASTER_PASSWORD = 'password',
MASTER_LOG_FILE = 'mysql-bin.000001',
MASTER_LOG_POS = 107; 
slave start;  

MySQL: Why Server ID ?


In any MySQL replication  set up, server ID will be a common variable seen in my.cnf. Ans usually it will be unique across the servers.

[mysqld]

server-id = 2

But why its so important?

For any normal Master-Slave setup, it doesn't matter whatever the number you set.
However, when you go for a Multi-master or circular replication, its a basic need.

Consider an  example, we have 3 servers A, B and C. Their server ids are accordingly 11, 22 and 33
They are replicating in circular manner (multi-master) like A is master of B server, B is master of C server and C is master of A server.

Now when any mysql client will execute insert/update/delete statement on A server, it will be logged in the binary of A server with server id 11.

As that statement logged in binary of A server, it will go to B server (because B is slave of A), executed there and logged in binlog of B server with the same server id 11.  (why same server id? because originally that statement initiated by A server with server id 11 and on B server its executed by sql thread not regular mysql client).

Now again this statement will go to C server (because C is slave of B), executed there and logged in the binlog of C server with the same server id 11.

As C is master for A server, when the statement will come to A server via binlog of C server, sql thread of A server will compare the server id and will find that its same 11 means this statement is originally initiated by the server itself and no need to execute so finally sql thread will skip that statement.

If sever ID wasn't unique, this setup would have entered into  an endless loop state..!