"Seven Databases in Seven Weeks" Reading Report
I just finished the make-up exam for the database course. I am sharing my reading report here (also hoping to pass~).
“Seven Databases in Seven Weeks” was published in 2012, introducing the seven most popular open-source databases, including relational databases (PostgreSQL), key-value databases (Riak, Redis), column-oriented databases (HBase), document-oriented databases (MongoDB, CouchDB), and graph databases (Neo4j). Except for PostgreSQL, the other six databases can be collectively referred to as NoSQL, meaning they do not use the relational model and do not use SQL as the query language.
This book imitates the style of “Seven Languages in Seven Weeks”, with one chapter for each database, divided into three sections, named Day One, Day Two, and Day Three. Unlike the official documentation of the databases, this book does not simply introduce each technology, but discusses the core basic concepts of each technology, enabling readers to understand the advantages and disadvantages of each database and which database should be used under what requirements.
The Relational Model Hits a Wall
For many years, relational databases represented by Oracle and MySQL have been the de facto choice and the content of our database courses. However, with the development of the Web and big data, the relational model has exposed a series of problems:
- The schema is not easy to modify. The relational model requires specifying entities, attributes, and relationships in advance, but in the complex and changing Web environment, it is often necessary to add or modify attributes and relationships, and even different entities need different sets of attributes, which is not easy to implement in the relational model.
- Strict consistency models and persistence sacrifice speed. Some Web applications only require so-called “eventual consistency”, that is, they can tolerate temporary data inconsistency during transaction processing, at which time the strict consistency model of relational databases affects speed. Similarly, some databases are just used as caches, or are insensitive to the loss of a small amount of data, at which time a large amount of data can be stored in memory without having to be written to the disk at all times. Some NoSQL databases take advantage of the non-strict persistence requirements, perform a large number of operations in memory, and increase throughput by an order of magnitude.
- Strict consistency models sacrifice availability or partition tolerance. The classic theorem of distributed databases is the “CAP theorem”: Consistency, Availability, and Partition Tolerance can only have two at most. Relational databases often pay the most attention to consistency, while partition tolerance is the primary consideration under big data conditions. Therefore, in NoSQL, it is often necessary to sacrifice a certain degree of consistency to ensure availability and partition tolerance.
The mathematically elegant relational model has hit the wall of scalability. The NoSQL movement was born in response. NoSQL emphasizes customization, that is, choosing the appropriate database according to actual needs and designing the appropriate distributed architecture. From this perspective, NoSQL has increased the burden on database administrators, and the era of buying a few “big iron” high-performance machines and resting easy is gone. For programmers, NoSQL is generally schema-less, which seems to eliminate some troubles, but in fact, just like the difference between dynamic typing languages and static typing languages, the fewer constraints, the easier it is for bugs to hide.
Since PostgreSQL is a classic relational database, “Seven Databases in Seven Weeks” introduces the relational model while only explaining some PostgreSQL-specific full-text search and multi-dimensional query functions, which are omitted in this article.
Riak
Riak is a distributed key-value database. An instance of Riak has several buckets (similar to tables in relational databases), each bucket can have many keys (similar to rows in relational databases), and the value can be any type of data (specified by the Content-Type in the HTTP header).
The query language is HTTP REST, that is, CRUD (Create, Read, Update, Delete) operations are implemented through HTTP GET (analogous to SELECT in SQL), POST (INSERT), PUT (UPDATE), DELETE (DELETE). Data types and some options are also specified through the HTTP header.
Of course, as NoSQL, Riak does not support the most important relational operations in relational databases.
Fault Tolerance
Fault tolerance is the most important feature of Riak. Riak was inspired by the Amazon Dynamo paper, its key contribution is to achieve tradeoffs between CAP (Consistency, Availability, Partition Tolerance). Riak has three parameters:
N: The number of replicas in the cluster, that is, the number of nodes that a write operation will eventually be copied to. Note that N can be less than the number of nodes in the cluster.
W: The number of nodes that must be successfully written before a write operation returns a response. If W is less than N, it will return success to the client after successfully writing to W nodes, and the remaining N-W nodes are still copying data.
R: The number of nodes required to successfully read a piece of data, that is, to read data from R nodes and select the one with the latest timestamp (to be detailed later). Why read data from R nodes (instead of 1)? Because when W is less than N, it is possible that the write operation has not actually completed and returns success, and then the read operation starts, and what is read may just be the old value. If R+W>N, it can be proven that the consistent data read will definitely not be the old value.
N, R, W three parameters have some classic configurations:W=N, R=1: This is the approach of relational databases, ensuring consistency by ensuring that write operations are completed before returning, but writing is relatively slow.
W=1, R=N: Makes writing the fastest, sacrificing read speed.
W = R = N/2+1: Shares the delay between reading and writing.
It is worth mentioning that the three parameters N, R, and W can be set differently for each key, and even the R parameter can be specified when passing in a read request (GET). This profound view of the Amazon Dynamo paper makes Riak flexible enough to handle different consistency requirements and data with different read and write frequencies.
Vector Clock
As mentioned earlier, when R>1, it is necessary to decide which version read out is the latest. This is a fundamental problem in distributed systems. The simplest solution is of course to synchronize the time of all machines accurately and stamp the data with a timestamp. Google’s Spanner (OSDI 2012 Best Paper) does this, using atomic clocks and NTP to ensure that the time difference between any two machines worldwide does not exceed 10ms. Of course, when the English version of “Seven Databases in Seven Weeks” was published, Spanner may not have been published yet.
The vector clock used by Riak uses a technology similar to Git. Each time it is updated, a pseudo-random stamp will be appended to the end of the vector clock (vector clock) of this key, which is equivalent to an increase in the length of the vector. When two different nodes synchronize, if the vectors of both parties are the same, of course, there is no need to synchronize; if one is a subset of the other, only one-way replication is needed (similar to the fast-forward update in Git); if the two vectors are not subsets of each other, then when the client queries this key, two values will be returned. The client can resolve the conflict by itself and submit the new value and vector (similar to the merge conflict in Git).
As the number of updates increases, the vector will continue to grow. Therefore, Riak provides a pruning option, which can trim old vector clocks (such as those from a day ago).
MapReduce
Several databases introduced in this book have a MapReduce framework. MapReduce breaks down the problem into two parts. One, through the map method, transforms a column of data into another column of different types of data; two, through the reduce method, transforms the column of data generated by the map into one or more scalar values. This pattern allows the system to break tasks into smaller component tasks and then run these tasks in parallel across a large-scale cluster.
In Riak, you can write MapReduce tasks in JavaScript. You can also save the JavaScript code as a map function into the database as a stored procedure, and this stored procedure code will be executed when you GET this key in the future.
HBase
HBase is a column-oriented database, based on the idea of Google BigTable, mainly used to solve big data problems. HBase is based on Hadoop, Hadoop is a scalable computing platform, providing a distributed file system and MapReduce. Hbase is suitable for back-end OLTP systems, it is relatively slow in single operations, but good at scanning huge data sets.
Data Structure
Hbase stores data in tables, each record is a row, a row can have several column families, each column family is a hashmap, each item in the hashmap is called a cell. This is one more layer than relational databases: the basic feature of relational databases is that the “cell” at the intersection of each row and column is atomic, while the data at the intersection of rows and column families in Hbase is a hashmap, and Hbase will index the keys in the hashmap.
This structure is very suitable for inverted indexing of text: the row is the article, there is only one column family, the hashmap is each word in the text as the key, and the number of occurrences of the word as the value. After Hbase indexes, it is easy to find out which article a word appears in the most.
The commonly used link information on the Web is also very suitable for this structure: the row is the page, the anchor text of each link in the hashmap is used as the key, and the link target is used as the value.
Why do we need multiple column families? This is to avoid creating multiple tables to store different types of information. For example, an article has an author, title, content, and comments. The author and title can be placed in the same column family, but if the content and comments need to be inverted indexed, placing them in the original column family will cause confusion, so it is best to create three column families, basic information (author, title), content index, comment index.
If you don’t need to frequently take out the entire row of data, it is recommended to divide different column families into different tables, so that the cluster can better divide the region.
Hbase natively supports version management of data, using timestamps accurate to milliseconds to identify data versions, so it is very suitable for applications like wiki that need to save historical versions.
Data Partition
Hbase can perform various operations through Hadoop’s Java API, and the book uses JRuby.
Like Riak, Hbase generates a unique ID (hash value) for each data. Hbase is divided into several non-overlapping regions according to the range of hash values, and each region is assigned to a specific region server in the cluster. The special .META. table records the information about which region server is responsible for serving which region for each table.
If a region server fails, the master server will reassign the regions originally belonging to the failed node to other region servers. If the master server fails, the region server with the next number will take over its responsibilities.
Compared with Riak, Hbase’s availability is not so high. Through reasonable parameter configuration, Riak can still work normally when only one node is alive, while Hbase can only survive in disasters where one or two nodes are down.
Another point worth noting is that Hbase can easily scale-up, but it is difficult to scale-down, so it is like a “nail gun”, capable of doing big things, but be careful not to hurt your fingers.
MongoDB
Relational databases have powerful query capabilities, databases like Riak and Hbase have distributed characteristics, MongoDB has found the best combination between the two. Therefore, MongoDB is the persistent storage adopted by many emerging Web applications.
Data Structure
MongoDB is composed of several collections (similar to tables in relational databases), each collection has several documents (similar to rows in relational databases), each document is a JSON object. Unlike relational databases, JSON objects do not have the concept of “schema”, and values can be nested to any depth (that is, the value corresponding to the key in the JSON object can be a JSON object)
Each document contains an _id field as the unique identifier for this document. This _id is automatically generated, composed of a 4-byte timestamp, a 3-byte machine ID, a 2-byte process ID, and a 3-byte increment counter, ensuring the uniqueness of the _id.
Data structures that require table connections (JOIN) in relational databases are often expressed through sub-documents in MongoDB. For example, an article contains comments. In the relational model, there needs to be an article table and a comment table, and then establish a foreign key; in MongoDB, only need to establish a collection of articles, each article as a document, one of the fields is a sub-document, containing all comments of this article.
This data structure is obviously more natural than the relational model. If you need to break the tree structure and query all comments ever posted by an author, you can create an index for it, and the search will not be slower than the relational database. MongoDB can use B-trees, two-dimensional indexes, or spherical geospatial indexes.
Query
The mother tongue of MongoDB is JavaScript, JavaScript natively supports JSON, making data operations seem very natural.
The most basic query method of MongoDB is find, it can be like SQL:
- Specify equal, greater than, less than, in the collection, etc.
- Supports regular expressions (this is a feature not available in SQL)
- Connect the above conditions with Boolean operators (and, or, not)
- Filter which fields are included in the result set (equivalent to projection in relational algebra)
- Use aggregation functions (such as count, distinct)
- GROUP BY, specify a custom reduce function
Of course, due to the limitations of the JavaScript language, the query conditions must be written in the form of JSON objects, which is not as intuitive as SQL.
Appending .explain() after the query statement can obtain information such as whether an index was used, just like the EXPLAIN statement in MySQL.
MongoDB provides instructions for atomic increment and decrement values, as well as atomic instructions for updating and returning old values.
MongoDB also provides a MapReduce JavaScript API.
Replication and Sharding
MongoDB uses a method similar to hot backup in relational databases to increase redundancy. The MongoDB cluster will automatically “elect” a master node to undertake all read and write operations, and the slave nodes automatically synchronize data from the master node. To ensure the normal conduct of the election, there should always be an odd number of nodes, otherwise if the network of the cluster is cut into two equal parts, neither side can elect a master node, causing service interruption. When only the last node is left, it will elect itself.
When the data set is so large that a single node cannot handle it, it can be horizontally sharded (sharding) according to the range of values. A mongos server is needed as a “reverse proxy”, serving as a single point to receive user requests, while maintaining the data set division of each mongod node and distributing requests.
Just like RAID, replication and sharding can be used at the same time, their purposes are to increase redundancy and improve performance. For example, three servers can form a replica set, two replica sets can form two shards, plus a configuration server and a mongos “reverse proxy” server, forming an 8-server MongoDB cluster.
CouchDB
CouchDB is very similar to MongoDB. Their two main differences are,
- CouchDB is more lightweight, it can be embedded in applications like SQLite.
- CouchDB’s data comes with version numbers.
Transaction
Neither MongoDB nor CouchDB provide “lock” and “transaction” functions. In MongoDB, it is often implemented through built-in “atomic operations”; the secret to avoiding conflicts in CouchDB is to ensure that only the latest version of the document is modified. CouchDB assigns a field named _rev to each document (the format is a sequentially increasing version number + pseudo-random string), and the _rev field is updated whenever the document changes. When updating (POST) and deleting (DELETE) operations, in addition to providing _id, _rev must also be provided. If _rev does not match, the operation is rejected. The rejection of the operation may be due to the fact that the document was modified between the last time this document was read out and the update and delete operations, causing the version number to be updated.
When deleting a document, CouchDB does not modify the document in place, but “updates” it with a new blank document.
Special keys in MongoDB start with $, and special keys in CouchDB start with _.
In addition to accessing CouchDB with a JavaScript API, you can also use an HTTP REST API similar to Riak.
View
Views in CouchDB are implemented through JavaScript functions. Usually, such a function performs a “full table scan” once, executes the map function for each document, and obtains a collection of map results. Hbase also uses a similar method to generate views. This method is flexible, but the full table scan is too time-consuming, so Hbase provides filtering conditions in the map function, which can preliminarily filter out documents that meet the conditions, and then send them into the map function; CouchDB does not have the function of filtering conditions, but it can cache the results of the map, so each time you query the view, you only need to execute the map function for the changed documents.
In addition to map-based views, CouchDB also supports reduce.
Change Push
In web applications, it is often necessary to know whether a certain specific value has changed. In relational databases, this is often implemented with triggers.
CouchDB can write JavaScript code to listen to a collection, and the listening code will be called whenever any write operation occurs on this collection. Adding a filter in the listening code can focus on a subset of changes, such as only listening for document deletions, or only paying attention to documents that meet certain conditions.
CouchDB’s change push works very well with Node.js’s event-driven mechanism, using a long polling method: send a listening request to MongoDB, a callback function will be called when an event occurs, the callback function receives chunks of data from MongoDB, until the data is received, parse the received data, and call the map function for each document.
Replication
CouchDB provides redundancy in a way similar to Riak, where each node can read and write independently. The symptom of a conflict is the same _id, but different _rev on two nodes. CouchDB’s handling method is very simple and crude: choose between two different _revs, of course, this selection algorithm is not random. Since CouchDB saves the history version of the document, if the client finds something wrong, it can read the history version for recovery.
Neo4j
Neo4j is a “graph database”, that is, a directed graph in the mathematical sense. The focus of Neo4j is on storing the relationship between data. Unlike relational databases, Neo4j does not need to specify the schema of the relationship in advance, which is very convenient for modifying relationships and saving “special case” relationships.
The recommended programming language for Neo4j is Gremlin, which is a domain-specific graph traversal language implemented in Groovy. It uses common graph theory terms in mathematics. Assuming g is a graph object and alice is a vertex in the graph, then
- g.V represents all vertices
- g.E represents all edges
- g.v(0) represents the 0th vertex
- g.v(0).map() can list the properties of the 0th vertex
- g.V.filter(it.name==’riesling’) can select vertices that meet the given conditions
- g.v(0).outE gets the set of outgoing edges of the 0th vertex
- g.v(0).outE.inV can find the set of vertices pointed to by the 0th vertex
- g.v(0).inE.outV is the set of vertices pointing to the 0th vertex
- g.v(0).inE.outV.next() gets the first vertex of the above set
- alice.bothE(‘friends’).bothV.except([alice]).name gets the names of all Alice’s friends (regardless of the direction of friends)
- alice.bothE(‘friends’).bothV.except([alice]).loop(3){ it.loops<=2 }.dedup.name gets the friends of Alice’s friends’ friends (where loop(3) means to loop the previous three steps bothE(‘friends’).bothV.except([alice]), loops<=2 means to loop twice, dedup means to deduplicate the set)
As can be seen from the above, the power of Gremlin lies in the “pipeline” operation on collections, which is very similar to jQuery’s operation on DOM elements.
Neo4j supports ACID transactions in single-node mode. In multi-node high-availability mode, because write operations on a node will not be immediately synchronized to other nodes, transactions cannot follow ACID.
Redis
The biggest feature of Redis is its speed. It is essentially a key-value database, but also provides advanced data structures such as sets, queues, stacks, publishers-subscribers, etc. Redis is generally used as a cache.
Redis uses SET to set keys, GET to get keys, MSET to set multiple keys at once, and MGET to get multiple keys at once. Redis’s atomic data type is a string, but it can recognize integers: INCR can atomically increment an integer key.
Similar to PostgreSQL and Neo4j, Redis also supports transactions. Use MULTI to enter a transaction, EXEC to commit a transaction, DISCARD to cancel a transaction, the usage is very similar to the transaction of a relational database.
Redis also supports the following advanced data structures:
Hash Table
It’s a bit like the nested documents of MongoDB or CouchDB, that is, the value corresponding to a key is another key-value storage. Of course, it can’t be nested anymore.
List
The list contains multiple ordered values, which can be used as a queue or a stack, because both ends of it can PUSH and POP. You can use LRANGE to get any part of the list, and LREM to delete matching values in the list.
The power of the list lies in the blocking pop operation, which can be used to support the messaging system. When the queue is empty, BRPOP will block until an element is inserted into the queue.
Set
The difference between a set and a list is that a set stores some unordered non-repetitive values. Redis supports operations such as intersection, union, and difference between sets.
Ordered Set
Hash tables, lists, and sets are all built-in in advanced programming languages. Not so with ordered sets. For example, we need to record the number of visits to each post and put the most visited ones at the front. Of course, relational databases can do it, but the performance of modification operations in relational databases is often not flattering. Writing a binary sort tree or heap by yourself is certainly okay, but there is no data persistence.
With Redis, you can achieve this function by using the ZADD command, providing two parameters: score and member. You can also return a sorted sequence within a certain range using the ZRANGE command.
The more powerful aspect of ordered sets is that you can create something like a “view”, defined as the union or intersection of several ordered sets. Each ordered set can specify a weight (used to multiply the score), and the minimum, maximum, or sum of the weighted scores can be used as the score of this item in the “view”. This helps to implement a simple sorting system, such as “popular posts” can consider factors such as the number of visits, the number of replies, the number of likes, etc.
Publish-Subscribe System
Linux does not natively provide a publish-subscribe system, and this task model is very common, so the current practice is either to implement a socket server or to use a message library like ZeroMQ. Redis also provides an elegant solution: SUBSCRIBE to subscribe to messages, PUBLISH to publish messages. Each published message will be pushed to subscribers who are receiving messages.
The difference between the publish-subscribe system and the blocking queue pop operation mentioned earlier is that if there are multiple subscribers at the same time, the publish-subscribe system will send the message to each subscriber, while the blocking queue will send the message to any one “subscriber”.
Persistence
The reason why Redis is fast is that all its operations are performed in memory. Redis provides two ways to persist: append-only operation logs, and periodically flush memory changes to disk.
Since Redis is often used as a cache server, you can also specify a TTL (Time To Live) for existing keys. Once the time is up, Redis will automatically clear this key.
Conclusion
The 6 types of non-relational databases introduced earlier, through reasonable combinations, can achieve a highly available, scalable, and high-performance storage system.
- MongoDB / CouchDB for general data persistence;
- Redis as a cache for MongoDB / CouchDB and temporary storage for Session data;
- Neo4j for storing complex inter-entity relationships;
- Hbase for big data analysis such as OLTP;
- Riak for data that requires high availability.
Finally, we compare the advantages and disadvantages of the 7 databases mentioned in the book in the table:Advantages Disadvantages MongoDB Complex queries Embedding capabilities CouchDB Easy to embed Complex queries Riak High availability Complex queries Redis Fast Complex data structures PostgreSQL Strict relational model Distributed availability Neo4j Flexibly represent relationships Big data Hbase Big data, integrated with Hadoop Fast queries
I’m sorry, but you didn’t provide any text to translate. Could you please paste the Markdown content you want to translate from Chinese to English?