Google wants to add an analytical turbo to PostgreSQL

Google has added a cloud database to its portfolio yet again. He presented the AlloyDB preview at the virtual Google I/O conference taking place May 11-12.

The American group has not yet communicated a date for general availability.

AlloyDB is an on-demand managed database built on the open source PostgreSQL RDBMS. Google already has several cloud services that support PostgreSQL, including Cloud Spanner and Cloud SQL for PostgreSQL.

“AlloyDB fills an important gap in Google’s database offering,” said Carl Olofson, analyst at IDC. “It is a fully relational DBMS capable of performing both analysis and transaction, as well as mixed operations which we call analytical transaction processing at IDC.”

Google is also integrating its Vertex AI service with AlloyDB to allow users to use machine learning directly with the database.

Combine analytical and transactional processing

Suppliers have been embracing this trend for a few years. While IDC refers to this capability as ATP, Forrester Research refers to the concept of translytic databases, while vendors like PingCAP refer to it as “hybrid transactional and analytical processing.” Google Cloud uses the diminutive HTAP.

Carl Olofson considers AlloyDB’s functionality to stand out from Google’s other database offerings, including Cloud Spanner and BigQuery. He believes that BigQuery is ideal for queries over large tables. Spanner, aims to offer to perform distributed processing on databases deployed in multiple cloud regions. Finally, AlloyDB is cut out for the HTAP approach.

As to why Google decided to build yet another database that supports PostgreSQL, Andi Gutmans, General Manager and VP of Engineering, Databases at Google Cloud, explains in an interview that this offering stems from demand from clients. A speech that we have already heard from the manager when he worked at AWS.

According to Andi Gutmans, Cloud SQL for Potgres customers are happy with the service, but need more security, performance, scalability, and availability.

To meet these needs, the manager mentions the fact that AlloyDB is compatible with the open source version of PostgreSQL. According to Andy Gutmans, this will make it easier for customers to adopt and migrate their workloads.

Also according to the manager, Google focuses on the quality of the service, its performance and its scalability.

In addition to the distributed architecture, AlloyDB benefits from the cluster-scale file system, Colossus. Or it already powers Spanner, Cloud SQL or even FireStore.

Mechanisms to differentiate AlloyDB

Although it seems common to a good number of services, the decoupling of compute and storage as well as the multizone replication system are specific to AlloyDB. This is the means chosen by the cloud giant to counter the big limitation of PostgreSQL. Namely that at some point, to extend the DBMS implementation, administrators create read-only copies of the database. Although standard, this technique lengthens the failover time and causes latencies.

Instead of copying the database, Google Cloud proposes to add several read-only replica instances in support of the main DBMS instance in charge of query processing.

This architecture relies on a distributed storage layer across a cloud region including a service to write WALs (write ahead logs or INSERT/UPDATE/DELETE change set information) from the main DBMS instance to a store. low latency. From this log store, WAL log processing services produce database “blocks” placed in a regional and sharded storage space. These blocks represent the state of the content of the DBMS at a time T. These operations are replayed very quickly with each modification of data. In the event of a crash of the primary node or the fall of a block of data, these can be “reserved” for the main node and the replicas, in order to avoid the loss of information. According to GCP, this optimizes I/O paths, creating replicas without multiplying copies unnecessarily.

To read data into PostgreSQL, the primary instance and replicas employ an in-memory buffer, shared between different processes to store tables, indexes, and execution plans. With its architecture, GCP asserts that there is no need for the database to interact with the storage layer, as long as a block of storage has not been dropped. And if queries are too heavy for this buffer, Google Cloud has added an additional caching layer at the DBMS level.

This architecture supposed to be more efficient does not solve all the disadvantages of PostgreSQL. By default, the DBMS stores data online. Or the storage of column-oriented data remains more efficient for analytical uses. It must speed up queries and compress data, where the open source RDBMS struggles to hold the load under intensive use if it is not supported by a much more expensive infrastructure.

Fortunately, PostgreSQL effectively supports extensions. GCP has developed an accelerator (or engine) guided by columns. This includes storage space and an optimized query engine that promises performance “up to 100 times” higher than a Postgres standard. It is this device that makes it possible to obtain HTAP capacities and to compete with the services of AWS, Oracle and Microsoft. For example, the Redmond firm supports the Citus project (since the takeover of the eponymous startup in 2019), which joined Azure for PostgreSQL, and which offers almost the same functionalities. This is also the specialty of Swarm64, acquired by ServiceNow.

Looking to the future, Andy Gutmans said Google’s AlloyDB development team has many ideas on how to continue improving query processing and optimization.

“I think we’re off to a good start, but there are lots of other ideas about things we need to do to make the customer experience even easier,” he says.

Leave a Comment