Operational databases such as MongoDB are deployed to support real-time analytic workloads that evaluate and help enterprises act on millions of diverse data sources that can change minute by minute or even second by second. Often the shape of the data whether it be semi-structured, unstructured or geospatial changes faster than systems can cope with. Example applications might include financial services companies analyzing ticks, tweets, satellite imagery, and any other type of data to inform trading algorithms in real time, or a retailer may decide to set up a digital geo-fence around their brick-and-mortar locations to push in-store incentives to shoppers in real time.
For these real-time analytic applications, performance and availability requirements are very high. In order to expedite an overall analytics job, application administrators need to accelerate both the data ingestion phase as well as the time required to mine the created data. Database applications such as MongoDB can incorporate any kind of data and are built to scale on commodity hardware deployed on-premise or in the cloud. But most IT admins attempt to resolve OLAP performance challenges by simply throwing more hardware resources as the problem which is often disruptive and highly expensive and may in the end not result in improved performance.
Figure 2 above depicts an OLAP lab experiment of MongoDB used to run analytics on a web server log repository to get answers for various business intelligence questions. The data load phase includes importing the data into the MongoDB database while the run phase includes running analytics on the job on the latest inserted data set.
In the pure HDD test run, the system delivered 207 operation per second (OPS), but when that same data set was accelerated with the PrimaryIO APA technology for MongoDB the system delivered 88,482 OPS, a 426% improvement. What is most compelling is that CacheAdvance was able to achieve 93% of all-SSD performance while only requiring 20% of the total database size be copied on SSD. That’s a significant TCO savings over alternative solutions that are less inefficient because they cache all I/O indiscriminately or, in the case of an all-flash array, place the entire database in flash. The PrimaryIO solution in contrast is much less expensive than an all-flash array and creates no disruption to operations during installation.