Python for the Web (in 2019)

Posted on Sat 09 February 2019 in Python Leave a comment

Last year I wrote about Python for the Web - a boilerplate project to kickoff my web development. It's been a year since I wrote that post, since then I have started a multitude of projects. Along the way I have had the need need to update and upgrade the project …

Continue reading

Python for the Web

Posted on Sun 18 February 2018 in Python Leave a comment

Having worked on so many Python Web projects recently, it became obvious I needed some kind of boilerplate to get new projects started, hence flask-boilerplate.

A lof of this is based on the excellent cookiecutter-flask - I've learned a lot from it, but made enough modifications and additions to call this …

Continue reading

PostgreSQL 10: Partitions of... partitions!

Posted on Sat 20 January 2018 in Postgresql Leave a comment

PostgreSQL version 10 brings a much anticipated feature: (Native) Table Partitioning.

(Emphasis on the NATIVE, PostgreSQL supported partitioning on previous versions by other means.)

There are a few gotchas you have to keep in mind when using this new feature: No PK allowed; No ON CONFLICT clauses; etc... In my …

Continue reading

Go: The Diamond Problem

Posted on Wed 12 July 2017 in Go Leave a comment

I've witnessed some of the online debate when it comes to Go and OOP. Some people say the language is object-oriented, others don't share that view. I think it's safe to say that Go isn't a traditional object-oriented language.

Lately, I've been trying to learn more about Go from an …

Continue reading

Go: Rate-limiting (done right)

Posted on Fri 24 February 2017 in Go Leave a comment

A couple of weeks ago I wrote a post on Data Pipelines with Go, Kafka and Cassandra. Towards the end of the post I presented a very rudimentary (and wrong) approach to rate-limiting (in the specific case of Cassandra write timeouts).

I wanted to emphasize how performant the solution is …

Continue reading

Data Pipelines: Cassandra, Kafka and Python (and Go!)

Posted on Mon 06 February 2017 in Go Leave a comment

Last year I started working on a 'Big Data' exercise. It's an ongoing project that mixes large amounts of web traffic, data ingestion and analytics. It's also really fun. We get to play with an array of new technologies - sometimes on a bet, granted - but most of the time it …

Continue reading

Valgrind and macOS Sierra

Posted on Sun 06 November 2016 in Docker Leave a comment

Valgrind is a code profiling tool for Linux, it is wildly known for it's memory leak detection and debugging capabilities.

At the time of writing, Valgrind is still not compatible with macOS Sierra (10.12), you can read more about it here.

Docker to the rescue

To get around this …

Continue reading

Why I'm excited about Apache Arrow

Posted on Sun 11 September 2016 in Arrow Leave a comment

Disclaimer: After reading up the limited amount of information available about Apache Spark, I've drawn my own conclusions on what the project purposes to fix. If any of the information below is incorrect, shoot me an email or post a comment so that I can rectify any mistakes!

Apache Arrow …

Continue reading

Cassandra and Spark

Posted on Sat 28 May 2016 in Spark Leave a comment

On my previous post I went over Running Apache Spark on a cluster.

Spark can read and write from many data sources, including Apache Cassandra.

Cassandra is a distributed database management system. It is a considered a NoSQL database (the usage of such term is questionable, albeit outside of the …

Continue reading

Running Apache Spark on a cluster

Posted on Tue 03 May 2016 in Spark Leave a comment

Apache Spark is a general-purpose data processing and analysis engine.

On the surface, it helps developers working with large data sets by providing easy to use libraries and modules. Spark integrates with various data sources (CSV, HDFS, remote databases, etc...), actions can then be performed against the data.

In the …

Continue reading