How to start your NETWAYS Managed Database

How to start your NETWAYS Managed Database

In the first Database tutorial, Sebastian already explained what Vitess is all about and what possibilities it offers you, when running your application compared to an ordinary database.

In this tutorial, I would like to explain how easy it is for you to start your NETWAYS Managed Database(s) within a few minutes in our Vitess Cluster!

 

What Makes Your Database Special

It has to be mentioned in advance that your setup is always highly available. We run Vitess in a Kubernetes cluster, which is equally distributed between our two data centers.

We always replicate the data from left to right. A managed database therefore consists of at least two replicas – a primary and at least one replica. If the primary is located in one data center (left), the replica automatically moves to the other (right). So, if one data centers fails, Vitess swings over and continues to run the database on the second data center without affecting your application.

Furthermore, disaster recovery is included out of the box. As soon as the database is created, an initial backup is made and subsequently a daily backup (MySQL dump) is created. The data is copied to S3 in Xtrabackup format and from this storage the data is restored, in case of emergency.

 

How to Get Your First Database Started

Enough of the theory – how does the start of a Managed DB work in practice? First of all you have to register on our NETWAYS Web Services (NWS) platform.

As soon as you have created your account and entered your payment method, you are ready to go!

First of all, you start your Vitess Cluster, as seen above. Within a Vitess Cluster you can start as many databases as you want. Under “Managed Contract” you have the option to rename your cluster.

 

 

Now you have to start the first database. Just click on “1. create a new database”.

In the following you can name your DB and choose in which Vitess cluster the database should be started (if you have more than one cluster).

Now you have to choose one of our four plans. These plans differ in the replica size, the included backup size and the included traffic. In addition, each next higher plan is assigned more resources (in the form of CPU and memory). So if you have a lot of load, it is recommended to choose a higher plan, because it can handle more queries/second.

After that, you can decide how many replicas should be created. The smallest value is always two (consisting of one primary and one replica). If your application is very read-intensive, you can increase the number of replicas as much as you like and thus distribute the load efficiently.

The retention period of the daily backups can also be freely determined. The backup of the last day is delivered by default. If you want the backups to be kept for several days, you can easily set this by increasing the days to the desired duration.

Last but not least, you can select the replication mode. Here, you can set how your data should be replicated.

In Asyn mode the primary is “fired” with transactions and the replicas just try to keep up with the primary as much as possible without a time delay – but these time delays (seconds behind master) can/may occur. In case of a failover, it can happen that a replica is switched to the primary, which did not get the latest transactions, because it was a few seconds behind.

This cannot happen in semi-sync mode. In semi-sync mode MySQL always expects at least one replica to have fulfilled the last transaction. This ensures that at least one replica is always at the exact same state as your primary. If the primary is broken, Vitess automatically switches to the latest replica and appoints it as the new primary – so it continues without data loss.

Here’s an advice: Ideally, a database setup in semi-sync mode consists of three replicas – a primary and two replicas. One of these two replicas is always up to date with the primary. If a failover occurs, the primary replica can continue to work, because it has an additional replica that confirms new transactions.

If this were not the case (and you had only one replica), then the replica appointed as primary would now lack a counterpart for the commitment of new transactions. So it would have to wait until a new replica is up to commit the transactions (this only affects writes or deletes; read queries are not affected).

Now click on “Create” and your database will be started! Et voilà – after a short moment your data is available and you can start. For your database you now have the options “Connect”, “Edit” and “Destroy”.

 

 

Connect” shows you how to connect to the database – as usual with mySQL with host, user, password and SSL options (we reject non-SSL connections).

There is one more special feature: your SQL connection will include the database name!

If you add @replica to the database name, then you connect to the replica (which is read-only).

So, if your application supports it (e.g. Ruby on Rails), you can specify your primary database connection and your replica database connection – and subsequently, all selected statements go to replicas and all data manipulation statements go to the primary. This way your setup can scale super well, as your application can direct the load of the read queries to the replicas.

 

 

With “Edit” you can configure your database.

So, if you run out of storage, you can simply switch to the next bigger plan here. Or you can increase the number of replicas and backups or adjust the replication mode again.

If the largest plan is no longer sufficient, our scaling via sharding takes effect. For this, however, a detailed tutorial follows.

 

 

As shown above, with “Destroy” you delete your database again.

 

 

Under “User” you will find all the credentials for the automatically created user. Furthermore, you can create (and delete) additional users and assign passwords.

 

 

With “Backups” you will always find the backup that is automatically created once a day by default.

You can also create a backup manually at any time (e.g. in case you are working on your database and want to make a last database backup) by clicking on “Create Backup” and selecting the database to be backed up (if you have started several in your cluster).

Depending on which rentention cycle you have set for your database, your intermediate backups will be deleted in the evening.

 

If you have several backups here, you can delete them – only the last (highest) backup cannot be deleted, because it is used to restore the replicas if necessary.

Under “Graphs” we visualize the utilization of your setup (queries/second), the storage allocation and the slave lag for all replicas.

Under “Manage contract” you can rename or cancel your Vitess cluster.

 

Now that’s all the magic behind starting your first NETWAYS Managed Database built with Vitess! 

What is Vitess?

What is Vitess?

Back in 2010 a solution was created to solve the massive MySQL scalability challenges at YouTube – and then Vitess was born. Later in 2018, the project became part of the Cloud Native Computing Foundation and since 2019 it has been listed as one of the graduated projects. Now it is in good company with other prominent CNCF projects like Kubernetes, Prometheus and some more.

Vitess is an open source MySQL-compatible database clustering system for horizontal scaling – you could also say it is a sharding middleware for MySQL. It combines and extends many important SQL features with the scalability of a NoSQL database, and solves multiple challenges of operating ordinary MySQL setups. With Vitess, MySQL becomes massively scalable and highly available. Its nature is cloud native, but it can also be run on bare metal environments.

 

 

Architecture

It consists of multiple additional components such as the VTGate, VTTablet, VTctld and a topology service backed by etcd or zookeeper. The application connects either by Vitess’ native database drivers or by using the MySQL protocol, which means any MySQL clients and libraries are compatible.
The application connects to the so called VTGate, a kind of lightweight proxy which knows the state of the MySQL instances (VTTablets) and where what kind of data is stored, in case of sharded databases. This information is stored within the Topology Service. The VTGate routes the queries accordingly to the belonging VTTablets.
A tablet, on the other hand, is the combination of a VTTablet process and the MySQL instance itself. It runs either in primary, replica or read-only mode, if healthy. There is a replication between one primary and multiple replicas per database. If a primary fails, a replica will be promoted and Vitess helps in the process of reparenting. This can all be fully automated. New, additional or failed replicas get instantiated from scratch. They will get the data of the latest backup available and hooked up to the replication. As soon as it catches up, it is part of the cluster and VTGate will forward queries to it. Here’s an image to visualize this whole process:

 

Scalability Philosophy

Vitess attempts to run small instances not greater than 250GB of data. If your database becomes bigger, it needs to be split into multiple instances. There are multiple good operational benefits of this approach. In case of failures of an instance, it can be recovered much faster with less data. The time to recover is decreased due to faster backup transfers. Also, the replication tends to be happier with less delay. Moving instances and placing them on different nodes for improved resource usage more easily is some plus, too.

Data durability is achieved due to replication. Outages and failures of specific failure domains are quite normal. In cloud native environments it is even more normal that nodes and pods are drained or newly created and you try to be as flexible as possible to such events. Vitess is designed for exactly this and fits perfectly with its fully automatic recovery and reparenting capabilities to a cloud native environment such as Kubernetes.

Further, Vitess is meant to be run across data centres, regions or availability zones. Each domain has its own VTGate and pool of Tablets. This concept is called “Cells” in Vitess. We at NETWAYS Web Services are distributing replicas evenly across our availability zones to survive a complete outage of one zone. It would also be possible to run replicas in geographic regions for better latency and better experience for customers abroad.

 

 

Additional Features

Besides its cloud native nature and possibilities of endless scalability, there are even more handy and clever features, such as:

  • Connection pooling and Deduplication
    Usually MySQL needs to allocate some (~256KB – 3MB) memory for each connection. This memory is for the connections only and not for accelerating queries. Vitess instead creates very lightweight connections leveraging Go’s concurrency support. Those frontend connections then are pooled on less connections to the MySQL-Instances, so that it’s possible to handle thousands of connections smoothly and efficiently. Additionally, it registers identical requests in-flight and holds them back, so that only one query will hit your database.
  • Query and transaction protection
    Have you ever had the need to kill long running queries, which took down your database? Vitess limits the number of concurrent transactions and sets proper timeouts to each. Queries that will take too long will be terminated. Also, poorly written queries without LIMITS will be rewritten and limited, before potentially hurting the system.
  • Sharding
    Its built-in sharding features enable the growth of the database in form of sharding – without the need of adding additional application logic.
  • Performance Monitoring
    Performance analysis tools let you monitor, diagnose, and analyze your database performance.
  • VReplication and workflows
    VReplication is a key mechanism of Vitess. With VReplication, Events of the binlog are streamed from the sender to a receiver. Workflows are – as the name would suggest – flows to complete certain tasks. For instance, it can move a running production table to different database instance with close to no downtime (“MoveTable“). Also streaming a subset of data into another instance can be done by this concept. A “materialized view” comes in handy, if you have to join data, but the tables are sharded on different instances.

 

Conclusion

Vitess is a very powerful and clever piece of software! To be precise, it is a software used by hyperscalers brought to the masses and now available for everyone. If you want to know more, we will post more tutorials, which will cover more advanced topics, soon on regular basis. The documentation of vitess.io is also a good source to find out more. If you want to try it yourself there are multiple ways of doing so – the most convenient way is to use our Managed Database product and trust on our experience.