Apache Super set . A feature packed open source tool for interactive BI visualization.
Apache Superset is actually a big data BI visualization tool and is a modern big data exploration and visualization platform that allows users to build dashboards quickly and easily using a simple code, a free visualization builder and the most advanced SQL editor.
The project was launched on Airbnb in 2015 and entered the Apache incubator in May 2017.
BI tools are a powerful weapon for data analysis. At present, there are many BI software on the market. The Superset backend is based on python, so…
Apache beam the latest open source project of Apache is a unified programming model for expressing efficient and portable Big Data pipelines.How ?
i)Unified : 1 unified API to process both Batch and Streaming data
Batch + Stream → Beam
ii)Beam pipeline once created in any language can be able to to run on any of the execution frameworks like Spark, Flink , Apex , CloudDataFlow etc.
It was started in 2016 and has become top level project for Apache.
It was has been developed by Google . Usually Google releases Whitepaper on lot of distributed data engineering systems …
So we all know that Kafka is a messaging system . Well this definition looks fancy but actually it is a very powerful high throughput and easily configurable streaming solution . It can be made to operate in real time and in near real time , depending on your use case . For simplicity and making people to easily understand this use case i would use the term “data” in place of “message”.
Data in Kafka flows in the forms of bytes . That’s the very reason you need to mention the serializer you opt in your client or custom…
Kubernetes is a contained management solution . Slowly this is becoming a platform of choice for hosting spark on top of it . Why? Let’s get to it .
In this post I would like to cover following points :
If you’re already familiar with k8s and why Spark on Kubernetes might be a fit for you, feel free to skip the first couple of sections and…
Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, unlike map-reduce no I/O overhead, fault tolerance and many more. With this much, you can do a lot in this world of Big data and Fast data. From “processing huge chunks of data” to “working on streaming data”, Spark works flawlessly in all. In this blog, we will be talking about the streaming power we get from Spark.
Spark provides us with two ways to work with streaming data
Let’s discuss what are…
I am an avid technologist having 12+ years experience in Data Analytics. Passionate to share new concepts and learning in Big Data and advanced Analytics domain