Cassandra and Spark

Posted on Sat 28 May 2016 in Spark Leave a comment

On my previous post I went over Running Apache Spark on a cluster.

Spark can read and write from many data sources, including Apache Cassandra.

Cassandra is a distributed database management system. It is a considered a NoSQL database (the usage of such term is questionable, albeit outside of the …

Continue reading

Running Apache Spark on a cluster

Posted on Tue 03 May 2016 in Spark Leave a comment

Apache Spark is a general-purpose data processing and analysis engine.

On the surface, it helps developers working with large data sets by providing easy to use libraries and modules. Spark integrates with various data sources (CSV, HDFS, remote databases, etc...), actions can then be performed against the data.

In the …

Continue reading