Apache Spark became popular as an in-memory data processing framework that is frequently used with Hadoop. It is also very quickly transforming into a hub for building of other data-processing products. Recently released, version 2.0 of the SQL-RDBMS-on-Hadoop solution Splice Machine use Spark as either of two processing engines. Incoming work is allocation between them depends is it an OLTP or OLAP workload.
Splice Machine is popular as a replacement for multiple terabyte of workload on current ACID RDBMS solutions like Oracle. They claimed it enables workloads for one ex Oracle client to run an order of magnitude faster, and Hadoop’s native scale-out architecture means the solution can possibly grow with the size of workloads at an inexpensive cost than an existing RDBMS.
Monte Zweben (Splice Machine co-founder and CEO) told in an interview that Splice Machine’s big innovation is that it allows OLTP and OLAP workloads to run parallel using the same data and architecture but with different processing engines eventually making it easy to make business decisions with the data.
Actually the architecture is capable of identifying the queries coming into the system to determine is it an OLTP or OLAP thereby sending the query accordingly to the right computational engine. Transactional queries run under HBase and OLAP queries are processed via Spark. This ensure memory and CPU usage for all sort if of query are kept segregated.
Spark’s original aim was to provide data scientists with an easy way to perform the type of data processing that required a lot of coding. Spark is currently used to rewrite the IBM DataWorks which is a data-transformation product. As far as machine case of splice is concerned it adds an entire new functionality and not limited to product enhancement.
Splice has tough competition in filed that is crowding very quickly. There are whole lot of possibilities like NewSQL, NoSQL and in-memory processing majority of these are designed to particularly satisfy the specific use cases at very high speed. The present database vendors like Oracle, Microsoft and Postgres are continuously increasing their strategy to compete with NoSQL and in-memory DB offerings whereas, Hadoop vendors are strengthening their distribution channels to satisfy the requirement and fetching quick analytics results. Hadoop has a lot many other features that make it singled handed solution for scalable storage.
The main selling point of splice machine is it allows reuse of current ANSI SQL such that speed and compatibility concerns will SQL-on-NoSQL solution are turning easier to overcome with time.
Looking forward to respond to your queries and comments regarding Apache Spark to Boost SQL Efficiency on Hadoop in Splice Machine 2.0.
About Singsys Pte. Ltd. Singsys is a solution provider that offer user friendly solution on cutting edge technologies to engage customers and boost your brand online results from a set of certified developers, designers who prefer optimized utilization of the available resources to align client’s idea with their skillset to reflect it into a Mobile application, Web application or an E-commerce solution.