Planet MySQL

Planet MySQL -
  1. Orchestrator 1.2.9 GA released

    Orchestrator 1.2.9 GA has been released. Noteworthy: Added "ReadOnly" (true/false) configuration param. You can have orchestrator completely read-only Added "AuthenticationMethod": "multi": works like BasicAuth (your normal HTTP user+password) only it also accepts the special user called "readonly", which, surprise, can only view and not modify Centralized/serialized most backend database writes (with hundreds/thousands monitored servers it was possible or probable that high concurrency led to too-many-connections openned on the backend database). Fixed evil evil bug that would skip some checks if binary logs were not enabled Better hostname resolve (now also asking MySQL server to resolve hostname; resolving is cached) Pseudo-GTID (read here, here, here) support now considered stable (apart from being tested it has already been put to practice multiple times in production at Outbrain, in different planned and unplanned crash scenarios) I continue developing orchestrator as free and open source at my new employer,    
  2. Making HAProxy 1.5 replication lag aware in MySQL

    HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.Agent checks in HAProxyHAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:Mark the server as down in HAProxy if replication is not workingSet the weight to 100% if the replication lag is < 10sSet the weight to 50% if the replication lag is >= 10s and < 60sSet the weight to 5% in all other situationsWe can use a script like this:$ less agent.php = 10 && $lag < 60){ return "up 50%"; } else return "up 5%"; } set_time_limit(0); $socket = stream_socket_server("tcp://$port", $errno, $errstr); if (!$socket) { echo "$errstr ($errno) n"; } else { while ($conn = stream_socket_accept($socket,9999999999999)) { $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '"; exec("$cmd",$lag); $weight = set_weight($lag[0]); unset($lag); fputs ($conn, $weight); fclose ($conn); } fclose($socket); } ?>If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:$ php agent.php 6789 3306You will also need a dedicated MySQL user:mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'' IDENTIFIED BY 'haproxy_pwd';When the agent is started, you can check that it is working properly:# telnet 6789 Trying Connected to Escape character is '^]'. up 100% Connection closed by foreign host.Assuming it is run locally on the app server, that 2 replicas are available ( and and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:frontend read_only-front bind *:3307 mode tcp option tcplog log global default_backend read_only-back backend read_only-back mode tcp balance leastconn server slave1 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions server slave2 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessionsDemoNow that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.No lag# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # HAProxy $ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,200Slave1 lagging# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 25 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,50 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,150Slave2 down# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: NULL # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,DOWN (agent),100 read_only-back,BACKEND,UP,100ConclusionAgent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.
  3. Drop Table slow

    A few days back we were given a MySQL database hosted in EC2 with around 5TB data to move some of the tables to a new server.  The intention was to share the tables between 2 EC2 instances. Since AWS had the option to take online snapshots, the plan was to take a snapshot, create a new machine with that snapshot and drop the unwanted tables.So everything went as planned until creating a new machine with the snapshot.  The real challenge was dropping the unwanted tables.  It took around 4 minutes to Drop a table whose size is 20GB.  It took 20 minutes to drop a 100GB table. The time kept on increasing for larger tables. MySQL even went to “defunct” when we killed the drop query and at times crashed. To track down this issue we executed drop table in one session and checked the processlist from another session that gave the below output.mysql> show processlist \G*************************** 1. row ***************************     Id: 2352   User: dbuser   Host: localhost     db: dbCommand: Query   Time: 2573  State: checking permissions   Info: DROP TABLE `test`I wondered if it is a MySQL user related permission or a OS related one. Now I went on to check the “InnoDB status”. Found something interesting in the “ROW OPERATIONS”. Printing it below--------------ROW OPERATIONS--------------0 queries inside InnoDB, 0 queries in queue1 read views open inside InnoDBMain thread process no. 6117, id 47405873539392, state: doing insert buffer mergeNumber of rows inserted 167872705, updated 1018100, deleted 55251906, read 1953144114171.93 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/sInnoDB Change Buffering : From MySQL reference manual “Index changes resulting from SQL statements, which could normally involve random I/O operations, are held back and performed periodically by a background thread. This sequence of operations can write the disk blocks for a series of index values more efficiently than if each value were written to disk immediately."So here is what had really happened. The given DB server is big in data and used to get frequent inserts and didn't have enough ram to accumulate all the pages in “InnoDB buffer pool”. So it had to cache all the secondary index changes to “Insert Buffer”. These cached changes are flushed to disk only when the pages are loaded to the “buffer pool” or when the server is idle. When we took a snapshot and mounted it in a new server the database was idle and so the InnoDB main thread started merging all the changes cached in “insert buffer” to the disk. So it was the “Innodb main thread” that held the lock on those tables and the drop statement that we executed has waited for the InnoDB main thread to complete the insert buffer merge process. So we waited for the buffer merge process to complete and then we executed the drop statements.All those tables got dropped in seconds. Even a 700GB table got dropped in 5 to 10 seconds.Also setting “innodb_fast_shutdown=0” and shutting down MySQL does the buffer merge operation.But what I feel is “buffer merging by keeping the server idle or setting “innodb_fast_shutdown=0” and restarting results in same amount of time”. Also the status “Checking permissions” that we got in the processlist output is very misleading. It is natural for everybody to thing that the drop statement is waiting for MySQL user permission or OS related permission. But actually it was waiting for permission from the InnoDB main thread. I still wonder why we need to buffer merges for a table that is to be dropped. I need to think further.Note : There might some some other reasons too for drop table slowness. But this is the one I have faced till date.--Aravinth C
  4. Monitor MySQL Performance Interactively With VividCortex

    If you’re monitoring MySQL performance on dozens or hundreds of servers, chances are you have a “rainbow chart” – a time-series chart with hundreds of tiny area graphs stacked on top of each other, crawling imperceptibly one pixel at a time across the big-screen monitor in your office. The trouble with these charts is they’re hard to see. It takes many minutes for enough new pixels to display after a change. In the meantime you can’t see the change clearly. At VividCortex, we think we’ve found a better way to keep tabs on what’s going on in your infrastructure: a bubble visualization. It is compact, and immediately communicates current status and trend, with visualizations that your eye interprets in a glance. This is the Activity layout on the Hosts dashboard. It’s designed to scale to hundreds, even thousands of hosts. Tabular layouts and strip-charts won’t do the trick, but live, interactive bubbles will. The bubbles update live, as agents send metrics to the APIs. They’re actually a D3 physics simulation of heat, gravity, friction, and several other forces. A moment later, this environment’s load characteristics changed dramatically! The bubbles are sized according to the host’s load, so busier hosts are larger. The color is the trend: redder bubbles are currently trending1 upwards, and bluer ones are getting less loaded. Similar hosts are grouped together in a single circle. It takes a lot of words to explain, but when you see it changing live, you get it instantly. You don’t see it in this screenshot, but as the visualization auto-selects the hottest host, the right-hand sidebar of the page updates with the host’s details and history, making this a great dashboard to display on that big screen for everyone to see. If you’re using it interactively, quick-targets activate so you can drill down into the auto-selected host. If you’re not getting immediate feedback from your MySQL monitoring system, what are you waiting for? Start your 30-day free trial today. Installation is super-fast and unintrusive. 1 In case you’re curious about the trend calculation, we use a MACD – Moving Average Convergence-Divergence. The exact metric varies by host type, but is either a utilization metric such as CPU busy, or a throughput metric such as queries per second.
  5. Scaling TokuDB Performance with Binlog Group Commit

    TokuDB offers high throughput for write intensive applications, and the throughput scales with the number of concurrent clients.  However, when the binary log is turned on, TokuDB 7.5.2 throughput suffers.  The throughput scaling problem is caused by a poor interaction between the binary log group commit algorithm in MySQL 5.6 and the way TokuDB commits transactions.   TokuDB 7.5.4 for Percona Server 5.6 fixes this problem, and the result is roughly an order of magnitude increase in SysBench throughput for in memory workloads. MySQL uses two phase commit protocol to synchronize the MySQL binary log with the recovery logs of the storage engines when a transaction commits.  Since fsync’s are used to ensure the durability of the data in the various logs, and fsync’s can be very slow, the fsync can easily become a bottleneck.  A group commit algorithm can be used to amortize the fsync cost over many log writes.  The binary log group commit algorithm is intended to amortize the cost of the binary log fsync’s over many transactions. The binary log group commit blog describes how two phase commit works with the binary log in MySQL 5.6. When a transaction commits, a transaction runs through a prepare phase and a commit phase.  Hey, it is called two phase commit for a reason. During the prepare phase, TokuDB writes a prepare event to its recovery log and uses a group commit algorithm to fsync its recovery log. Since there can be many transactions in the prepare phase concurrently, the transaction prepare throughput scales with the number of transactions. During the commit phase, the transaction’s write events are written to the binary log and the binary log is fsync’ed. MySQL 5.6 uses a group commit algorithm to fsync the binary log.  Also during the commit phase, TokuDB writes a commit event to its recovery log and uses a group commit algorithm to fsync its recovery log. Since the transaction has already been prepared and the binlog has already been written, the fsync of the TokuDB recovery log is not necessary. XA crash recovery will commit all of the prepared transactions that the binary log knows about and abort the others. Unfortunately, MySQL 5.6 serializes the commit phase so that the commit order is the same as the write order in the binary log. Since the commit phase is serialized, TokuDB’s group commit algorithm is ineffective. Luckily, MySQL 5.6 tells TokuDB to ignore durability in the commit phase (the HA_IGNORE_DURABILITY property is set), so TokuDB does not fsync its recovery log. This fixes the throughput bottleneck caused by serialized fsync’s of the TokuDB recovery log during the commit phase of the two phase commit. Since MariaDB uses a different binlog group commit algorithm, we have some additional work to ensure that TokuDB works nicely with it. We used the SysBench update non-indexed test to measure throughput and will post a detailed blog with results later.     The post Scaling TokuDB Performance with Binlog Group Commit appeared first on Tokutek.