3
These days, high availability is an imperative. Data is distributed across multiple
instances, or even multiple datacenters. Clusters can be scaled out on more nodes.
Failures can cause cluster recongurations. This begs the question - how does the
application know which database node to access? How does an application detect
that the database topology has changed? How do we shield the applications from the
complexity of the underlying databases?
At some point, the concept of man-in-the-middle became popular and database
environments started integrating proxies. This whitepaper will discuss what proxies are,
what is their use and how to build a highly available and highly controllable database
environment using modern proxies.
1.1. What Is a Database Proxy?
A proxy is a software which handles connectivity between two sides. Within a context
of databases, a proxy sits in the middle, between application and database. The
application connects to a proxy, which forwards connections to the database. Let’s stop
here for a second and try to analyze this statement and see what might be the gains
of using a proxy? For starters, one, huge gain is that application connects to the proxy
only. In the database world, it is not easy to determine where trac should be directed
to. There are writeable or intermediate masters, and read replicas. The replication
topology constantly evolves. It is not a good idea to hardcode connectivity patterns. On
the other hand, writing code to track topology changes is something that needs to be
carefully planned, designed and tested. This is where the proxy comes in. With a use of
proxy, applications can connect to it (or to a pool of proxies) and the application may
expect that the trac will be routed to a functioning database.
Since trac is relayed by the proxy, the latter can be also a great source of information
about the trac itself. It can provide statistics on the trac, e.g. number of queries
executed per second, their execution time, statistical data like 95 percentile, maximum,
minimum, average, all based on the collected metrics.
Advanced proxies can also alter the trac - as everything passes through them, such
proxies can provide to an admin a high degree of control over queries - queries can
be cached, rewritten, rerouted, stalled or killed. This allows the DBA to shape the trac
and react to the issues immediately, even without requiring an application developer to
modify the application and redeploying it.
Finally, proxies can help to scale the environment not only through sending trac to
multiple slaves but also they can help to build sharded setups using trac routing
logic created within the proxy. As you can see, an advanced database proxy is not just
a packet routing device but it can be utilized in multiple ways, improving the options of
the operations team to manage the database tier.
Introduction