SQL vs. NoSQL
I came across this topic when evaluating data storage methods for a system design challenge. and realize it’s would be beneficial to learn the difference systematically. This is my learning notes based on “grokking system design”.
A design choice example - what to use?
Take the URL shortening service as an example. This service will provide short aliases redirecting to long URLs.
Nature of the data:
- Billions of records
- Each object is small
- No relationship between records except user-URL pairs
Two tables, one for URL mappings, one for user’s data who created the short link.
Now the question is, what database to use?
What is SQL
- SQL uses a relational database, which is structured and has predefined schemas.
- Each row contains all the information about one entity and each column contains all the separate data points
- Popular choices: MySQL, Oracle, MS SQL Server, SQLite, Postgres, Maria DB
What is NoSQL
- Non-relational databases are unstructured, distributed, and have a dynamic schema like file folders that hold everything.
- Common types of NoSQL:
- Key-Value Stores: Data is stored in an array of key-value pairs. Common choices: Redis, Voldemort, and Dynamo.
- Document Databases: Data is stored in documents and these documents are grouped together in collections. Each document can have an entirely different structure. Common choices: CouchDB and MongoDB.
- Wide-Column Databases: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns upfront and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large datasets - big names include Cassandra and HBase.
- Graph Databases: Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines (connections between the entities). Examples of graph databases include Neo4J and InfiniteGraph.
Storage: SQL use tables. NoSQL databases have different data storage models.
Schema: SQL uses a fixed schema. NoSQL uses a dynamic schema.
Query: SQL database use SQL for defining and manipulating data. NoSQL databases use different syntax, which is sometimes called UnQL (Unstructured Query Language).
Scalability: SQL databases are vertically scalable, mostly done by increasing the capacity of a server, which is expensive. NoSQL databases are horizontally scalable, meaning we can add more cheap commodity hardware easily in our NoSQL database infrastructure to handle a lot of traffic.
Reliability: When it comes to data reliability and safe-guarantee of performing transactions, SQL databases are still the better bet. Most of the NoSQL solutions sacrifice reliability for performance and scalability.
Which one to use
Reasons to use SQL database
- Need reliability and data consistency, like stock trading and banking. Can prescribe exactly how tractions interact with the database.
- Need to handle relationships, like social media.
- Data is structured and unchanging
Reasons to use NoSQL database
- Handling a large number of read and write operations, since NoSQL can add nodes on the fly and handle rapid development
- Storing large volumes of data that often have little to no structure
- Making the most of cloud computing and storage.
- Running data analytics, where we have to deal with an influx of massive amounts of data.
Revisit the example: which one to use
“Since we anticipate storing billions of rows, and we don’t need to use relationships between objects – a NoSQL key-value store like DynamoDB, Cassandra or Riak is a better choice. A NoSQL choice would also be easier to scale. Please see SQL vs NoSQL for more details.”