IndyNDA Wrap-up: MongoDB presented by Dennis Burton
At IndyNDA last night we heard Dennis Burton speak on MongoDB. Personally, I have never dived into any other data store other than relational databases (i.e. SQL Server, MySQL…) Dennis did a great job laying out the pluses and minuses of using a document-oriented databases.
Document-oriented databases are a viable solution for web applications that need to scale quickly with tons of data. We can all think of large websites that would contain a huge amount of information to be retrieved. Relational databases is a difficult architecture to scale. A key-value system is super fast, but data is stored in memory and needs to be updated if data changes (example is memcached.) Document-based databases server to be a solid ground between key-value and relational architectures.
Because of this, we see large web companies develop or use alternative data stores to fit their needs. Google uses BigTable, Facebook uses Cassandra, and LinkedIn is using Project Voldemort. Even Microsoft is getting into this market with their Azure services.
Dennis pointed to Nathan Hurst’s post on the CAP Theorem. In this post he has a graph that shows qualities of NoSQL systems. This theorem states that it is impossible to provide all three qualities of NoSQL which are Availability, Partition Tolerance, and Consistency. You can provide two of the three, and Nathan Hurst’s post groups implementations into their respective two-feature areas.
Issues with relational databases that are answered with document-oriented databases can be seen with joins. What happens when joins need to be between tables on different servers? Document-oriented databases keep all relative information on a “subject” together. Almost removing the need for a join in the first place.
There are quirky things to get use to when using MongoDB (if you are use to relational databases.) With MongoDB, writes are guaranteed, but reads are eventually consistant. This is different than the relational world where the database locks information being written so data cannot be read or manipulated until a lock is released. IN MongoDB, the database writes the area, but a read at the same time will pull old information until everything cycles. Just fine for a Twitter-like app, but not so good for ATM software.
Dennis also described how flexible MongoDB is on scalability. MongoDB can be switched into a master-slave implementation (on different machines or on the same machine) with a single console command. This replicates the database across multiple instances. It can also shard the data across multiple instances. This is where you can split your database for storage. It can also combine the two functionality. For example, you can have 4 instances. Two instances are sharded user data with half of the user data on one instance and half on the other. The other are replications of the two sharded instances so that you have redundancy.
You can find MongoDB at http://www.mongodb.org.