Funnel charts are mostly used for representing a sequential process, allowing the viewers to compare and see how the numbers change through the stages.
In this article, we’ll explore how to build a funnel chart from scratch using Matplotlib, and then we’ll have a look at an easier implementation with Plotly.
There is no method for instantly creating funnel charts in Matplotlib, so let’s start with a simple horizontal bar chart and build from there.
import matplotlib.pyplot as plty = [5,4,3,2,1]
x = [80,73,58,42,23]plt.barh(y, x)
In this article, we’ll explore Kepler.gl, an open-source solution for geospatial data visualization and exploration. Kepler was developed by Uber to make it easier for users of all levels to design meaningful maps that also look good. The tool can handle large amounts of data and has a friendly, intuitive interface that allows users to build effective maps in an instant.
Available for all to use since 2018, it’s about time we get a closer look at how the tool fits into the data visualization landscape. …
Data visualization is all about reducing complexity; we use graphical representations to make difficult concepts and insights more comfortable to understand.
Titles, subtitles, notes, annotations, and labels serve an essential function in this process. They guide our audience through the story we’re trying to tell, much like a narrator.
In this article, we’ll explore the functions of titles, subtitles and labels, get a look at how to add annotations to our charts and check how to use custom fonts in Matplotlib.
Let’s start with a simple line chart.
import matplotlib.pyplot as plt# data
spam = [263.12, 302.99, 291.23, 320.68, 312.17, 316.39,
347.73, 344.66, 291.67, 242.42, 210.54, …
Clustering sure isn’t something new. MacQueen developed the k-means algorithm in 1967, and since then, many other implementations and algorithms have been developed to perform the task of grouping data.
In this article, we’ll explore how to improve our cluster’s visualization with scatter plots.
Let’s start by loading and preparing our data. I’ll use a dataset of Pokemon stats.
import pandas as pddf = pd.read_csv('data/Pokemon.csv')# prepare data
types = df['Type 1'].isin(['Grass', 'Fire', 'Water'])
drop_cols = ['Type 1', 'Type 2', 'Generation', 'Legendary', '#']
df = df[types].drop(columns = drop_cols)
df.head()
Data by itself can be quite interesting, but even if you’re dealing with a small dataset, the chances are that you’ll have to summarize or aggregate it in some way. That’s where we’ll need groups.
Sure, it’s nice to know the total amount of sales. But it’s often more interesting to know the total amount of sales by salesperson, or by month.
Grouping data is undeniably essential for data analysis, and in this article, I’ll investigate some of the methods for doing so with R, Tidyverse and dplyr.
The dataset I’ll use for the next examples comes from Kaggle and contains Spotify’s top songs from 2010 to 2019. …
There are plenty of ways to display three variables in a single visualization. Heatmaps and Colormaps rely on encoding the third variable in the color. Bubble charts are Scatter Plots with the third variable encoded in size, and other solutions may introduce a Z-axis and rely on 3-dimensional representations.
Ternary plots are a less known solution that doesn’t require our user to compare colors, circumference sizes, or 3D distances.
They’re a two-dimensional representation where all the three variables are encoded by their positions to three connected axes, in the shape of a triangle.
This article will go through the basics of how to draw ternary scatter plots using Plotly Express. …
I’ve been playing around a lot with R’s ggplot and decided to compare it with Python’s Matplolib.
In some ways, they feel very similar but also not at all. So I decided to build a scatter plot with R and replicate it with Python to check their advantages and disadvantages.
A prevalent task in any data analysis is comparing multiple sets of something. You may have lists of IPs for each landing page of your website, clients who bought certain items from your store, multiple answers from a survey, and so many others.
This article will use Python to explore ways to visualize overlaps and intersections of sets, the possibilities, and their advantages and disadvantages.
For the next examples, I’ll use a dataset from the Data Visualization Society 2020 Census.
I’m using the survey because it has many different types of questions, where some are multiple-choice questions with multiple answers, like the bellow. …
There are plenty of ways to build animations in Matplotlib. They even have an Animation class with functions and methods to support this task.
But I often find those methods over-complicated, and many times I want to get something together without too much complexity.
In this article, I’ll go through the basics of creating charts, saving them as images, and using Imageio to create a GIF.
You’ve probably heard lots of reasons not to use pie charts. The lack of precision, hard to read angles, scalability limitations, and ink-ratio are some of the most mentioned ones.
According to John W. Tukey, a famous statistician known for developing the FFT algorithm and box-plots.
“There is no data that can be displayed in a pie chart, that cannot be displayed better in some other type of chart.” — John Tukey
Alright, I get it — Pie charts bad.
But still, if you google ‘data visualization’ and go to images, I bet you’ll find lots of pies there. …
About