Monday, 21 October 2013

BangDB vs LevelDB - Performance Comparison

This post is about performance comparison for BangDB vs LevelDB. Following are high level overview of the dbs.

LevelDB

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Leveldb is based on LSM (Log-Structured Merge-Tree) and uses SSTable and MemTable for the database implementation. It's written in C++ and availabe under BSD license. LevelDB treats key and value as arbitrary byte arrays and stores keys in ordered fashion. It uses snappy compression for the data compression. Write and Read are concurrent for the db, but write performs best with single thread whereas Read scales with number of cores

BangDB

BangDB is a high performance multi-flavored distributed transactional nosql database for key value store. It's written in C++ and available under BSD license. BangDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write, Read are concurrent and scales well with the number of cores. BangDB used here is the embedded version as LevelDB is also an embedded db, but BangDB is also available in other flavors like client/server, clustered and Data Fabric(upcoming)

Following commodity machine ($400 commodity hardware) used for the test;
  • Model: 4 CPU cores, Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 64bit
  • CPU cache : 6MB
  • OS : Linux, 3.2.0-54-generic, Ubuntu, x86_64
  • RAM : 8GB
  • Disk : 500GB, 7200 RPM, 16MB cache
  • File System: ext4
Following are the configuration that are kept constant throughout the analysis (unless restated before the test)
  • Assertion: OFF
  • Compression for LevelDB: ON
  • Write and Read: Sequential and Random as mentioned before the test
  • Access method: Tree/Btree
  • Key Size: 10 bytes
  • Value Size: 16 bytes
The tests are designed to cover following;

Performance of the dbs for
  1. sequential write and read for 100M keys using 1 thread
  2. sequential write and read for 75M keys using 4 threads
  3. random write and read for 75M keys using 1 thread
  4. random write and read for 75M keys using 4 threads
  5. sequential write and read for 1 Billion keys using 1 thread

Note that the data being written on disk is much more than the available RAM on the machine, esp in Billion keys ops, hence db has to flush data, make new pages available for on going ops to continue. Typically amount of data written in the range of few GB to hundred of GB with only 8GB of RAM. BangDB used 5GB (Buffer_pool size set was 5GB overall) as buffer pool hence it did not use any memory more than 5GB. Whereas LevelDB uses as much memory as it can.

Also most of the tests are done with single thread mainly because LevelDB put performance is best with single thread and it degrades considerably if we have more concurrent threads. However, BangDB leverage the CPUs well  and both read and write improves with more threads. Test number 2 & 4 are the case in point here.


1. Sequential put and get for 100M keys and values using 1 thread


Here we note that both dbs perform well with IOPS around 580,000 ops/sec



For get, we note that BangDB finishes the operations around 60 sec before LevelDB hence BangDB takes around 25% less time as compared to LevelDB.

IOPS for BangDB = 600,000 ops/sec
IOPS for LevelDB = 450,000 ops/sec


2. Sequential put and get for 75M keys and values using 4 threads


BangDB improves the performance with 4 threads whereas performance of LevelDB with 4 threads decreases by 3 fold.

Avg IOPS for BangDB = 680,000 ops/sec
Avg IOPS for LevelDB = 205,000 ops/sec


Here again BangDB finishes task much before LevelDB. But later LevelDB picks up and performs with higher IOPS

Avg IOPS for BangDB = 650,000 ops/sec
Avg IOPS for LevelDB = 385,000 ops/sec


3. Random put and get for 75M keys and values using 1 thread



We note that for put operations for random keys, LevelDB and BangDB performs well except that LevelDB takes a dip for around 75 seconds (~23% of its total run time) where it's performance is almost close to zero. This makes LevelDB take around 125 seconds more than BangDB. BangDB finishes put for 75M keys in 250 sec and LevelDB in around 325 sec.

IOPS for BangDB = 300,000 ops/sec
IOPS for LevelDB = 230,000 ops/sec

Note that the IOPS for both the dbs are lower when random keys are used compared to when sequential keys were used. However, performance for LevelDB goes down by larger margin



Here we see interesting data, LevelDB performance for get for random keys goes down drastically, whereas BangDB takes time to pick up but finishes much ahead of LevelDB.

IOPS for BangDB = 210,000 ops/sec
IOPS for LevelDB =   55,000 ops/sec


4. Random put and get for 75M keys and values using 4 threads



BangDB overall performs better than LevelDB, though LevelDB remains consistent throughout.

IOPS for BangDB = 375,000 ops/sec
IOPS for LevelDB = 130,000 ops/sec

The IOPS for BangDB is more with more threads is because BangDB leverages the number of CPUs available on the machine whereas LevelDB does not for put operations.



Again, BangDB performs better with more threads but LevelDB, which typically performs better with more threads for get operations, here performs worse when compared with the performance with 1 thread. This is mainly because, LevelDB is very good in handling sequential ops and it performs better with more threads for sequential get operations, however, it's evident that for random IO, its performance degrades with more threads. Again for machine with multiple CPU, BangDB exploits the situation much better

Avg IOPS for BangDB =  240,000 ops/sec
Avg IOPS for LevelDB =   40,000 ops/sec


5. Sequential put and get for 1 Billion keys and values using 1 thread



LevelDB performs better than BangDB in terms of IOPS, but both are very consistent and high performant.

Avg IOPS for BangDB = 560,000 ops/sec
Avg IOPS for LevelDB = 560,000 ops/sec


For read, LevelDB takes lot more time than BangDB to complete the job.

Avg IOPS for BangDB = 500,000 ops/sec
Avg IOPS for LevelDB = 150,000 ops/sec

LevelDB spends lots of time initially, for almost half of it's run time, with just few thousands of ops/sec and later it picks up with much higher number.

Conclusion

Both BangDB and LevelDB are high performance databases. However there are certain highlights based on the data collected above;

  • BangDB and LevelDB perform very well for sequential operations
  • For random operations, performance of LevelDB goes down considerably, whereas BangDB's still performs well
  • BangDB leverages the available CPUs on the machine fully and performs better with more threads (upto num of CPU), whereas LevelDB write is best with single thread and read for sequential ops improves with more threads. Hence BangDB is more suitable for multi core machines
  • Use of SSD would benefit both DBs but it will benefit BangDB more as it exploits the CPUs better than LevelDB and in the absence of seek time, BangDB would give lot better performance than LevelDB (would be demonstrated in upcoming blog)
Please see the earlier blog on same topic at highscalability . Current blog also tends to address the requests received for performance tests for larger number of keys and values there in the previous blog at high scalability

Note that BangDB is also a server hence interested folks can also see the comparison with Redis here

Upcoming is the BangDB as Data Fabric, Document Database and Columnar DB in respective separate blogs

Those interested in trying out BangDB, please visit iqlect