Data Science – First Impressions

After some thoughts on what to learn next, I enrolled myself into IBM Data Science Professional Certificate program. It is beginner level program to provide the necessary foundations for Data Science. I have completed 3 courses so far:

  1. What is Data Science?
  2. Tools for Data Science
  3. Data Science Methodology

I liked all the courses so far. Despite IBM Cloud tools’ big presence, the courses cover the concepts in a somewhat generic way.

This post is NOT a review of the course or certification.

This post is mostly about my first impressions on the domain of Data Science.

1. It’s not strictly a science

Data Science doesn’t have a clear definition. Everyone defines it the way they want. Murtaza Haider – author of Getting Started with Data Science puts it simply as

Data Science is what data scientists do

It is not statistics either, so I guess instead of inventing a new word for experimental statistics, data analysis, visualisation and model building, someone called it science.

2. Engineers will probably hate it

If there is one underlying principle that defines engineers, it is their love for certainty. Engineers strive to build systems governed by a set of rules that will produce predictable outcomes. Data Science is actually quite the other way around, you start with a question and move towards an answer which could be anything, valid or invalid. It is a frustrating experience to go the full way without knowing where. Sure, we get some hints here and there throughout the process. If one has the “journey is more important than destination” attitude, it is a good ride. But if you yearn for certainty or get frustrated mid way through the process, you will probably end up torturing the data to confess the way you want it to.

3. The Jargon

It is a field of jargon. Everything has a specific word/phrase. If you open the data in excel and see it, you will probably call it inspecting the data. As a data scientist, you do the same with a couple of lines of Python Code on a Jupyter Notebook and call it “Exploratory Data Analysis” or EDA, if you change value or format them to a specific standard is “Data Manipulation” or DML (did that thing even need a 3 letter acronym?). If you query some data, do some manipulations, it is a Extract-Transform-Load or ETL pipeline.

While some of them are valid terminologies (like ETL) which exist to communicate effectively, some of them are just marketing jargon which exists to make stuff seem bigger than they are.

4. The Ah..ha moments

This is the best thing about Data Science according to me. It is those moments when your perception is altered by the data or when the output is altered by the perception. Engineering by its nature is application of established principles, we get our ah..ha moments when we learn a new concept invented by a scientist. But in data science the data can produce such moments by their mere nature of being exploratory in nature. Since we start with a question and follow the data to the answer, it usually results in a light bulb moment.

5. Practicality

This is the area which is most conflicting for me. Things like “Data is the new oil” has been said for close to a decade now and the general consensus seems to be that more data the better. But both my personal experience as a teacher and the case study about a health insurance provider in the course have made me wary of the practicality of this approach. I can write about my personal experience as anecdotal evidence. But I think I would give a more data oriented reason (the post being about data science and all).

Explosion of healthcare administrators in the US

While studying the data science methodology, it becomes acutely clear how this happened. I am not saying data science is the root cause of the problem, but it is definitely a contributing factor as no model built by any data scientist can be static. It is iterative process which keeps cycle of data collection modelling and output going. So when overdone, it feels more of a problem than a solution.

6. The machines are coming for the data scientists

The hype for data scientists went mainstream, I think with the McKinsey report of 2018 saying there will be a shortage of 140,000 to 190,000 data professionals in US alone. Followed by data scientists becoming the highest paid in technical jobs. Going by IBM tools that I have used during the course, I don’t think the hype will age well.

Money is in automation – an estimated 70 to 90% time is spent by data scientists in collecting and preparing the data for the analysis. If there is anything that computers were invented for, it is to automate such mundane processes. Even if a company is saving 50% of that time, it is great cost saving. So automation tools will cut down the work and thus the demand for data scientists.

Side Note: You can create a free account at cloud.ibm.com and checkout their tools to get an idea of the direction and sophistication of the tools that are being developed.

Uncertainty of the outcome – as mentioned earlier, not all organisations will gain from having a data science team. Apart from the data, a variety of things like – organisation size, scope for data driven decision making and the talent of the data science team all impact the outcome of the exercise. Combined with off the shelf offerings from IT companies, data scientist’s role might shrink to just an computer operator in some cases.

No, I am not saying data scientists will become obsolete. Just that it is not going to live up to the hype.

Before you throw brick bats

I have completed only 3 of 9 courses in the program. Once I complete another 3 courses, I think I will have a better understanding of things and will revisit the topic again.

text/plain MIME Type and Python

When you do echo "x" > my_file and then check its MIME type using file --mime-type my_file it would say text/plain. But, when you do the same in Python by

with open("my_file_2", "w") as fp:
fp.write("x")

and then check the MIME type it would say application/octet-stream. What’s the difference?

For the impatient

echo adds a new line to file which tells the file utility it is a text file.

For the curious

When I saw this question on StackOverflow, I was really stumped due to the following reasons:

  1. I didn’t know the file utility can be used to get the mime-type of the file. I thought MIME Type is only relevant in the context of web server and clients. After all, MIME stands for Multipurpose Internet Mail Extensions
  2. I thought operating systems usually use the file extension to decide the file type, by extension the mime type. Don’t the OSes warn when we touch the extension part of the files while renaming, all the time? So, how does file utility do this on a files without any extension?

Adding extensions

Lets try adding extensions:

$ echo "x" > some_file.txt
$ file --mime-type some_file.txt
some_file.txt: text/plain

Okay, that’s all good. Now to the Python side:

with open("some_file_2.txt", "w") as fp:
fp.write("x")
$ file --mime-type some_file_2.txt
some_file_2.txt: application/octet-stream

What? file doesn’t recognise file extensions?

The OS conspiracy theory

Maybe echo writes the mimetype as a metadata onto the disk because echo is a system utility and it knows to do that and in Python the user (me) doesn’t know how to? Clearly the operating system utilities are a cabal of some forbidden knowledge. And I am going to uncover that today, starting with the file utility which seems to have different answers to different programs.

How does ‘file’ determine MIME Type?

Answers to this question has some useful information:

How do you change the MIME type of a file from the terminal?

  1. MIME Type is a fictional value. There is no inherent metadata field that stores MIME Types of files.
  2. Each operating system uses a different technique to decide file type. Windows uses file extension, Mac OS uses type creator & type codes and Unix uses magic numbers.
  3. The file command guesses file type by reading the content and looking for magic numbers and strings.

Time to reveal the magic

Let us peer into the souls of these files in their purest forms where there is no magic but only 1s and 0s. So, I printed the binary representation of the two files.

$ xxd -b my_file
00000000: 01111000 00001010 x.

$ xxd -b my_file_2
00000000: 01111000 x

The file generated by echo has two bytes (notice the . after the x) whereas the file I created with Python only has one byte. What is that second byte?

>>> number = int('00001010', 2)
>>> chr(number)
'\n'

And it turns out like every movie on magic, there is no such thing as magic. Just clever people putting new lines to tell file it is a text file.

Creating a trick

Now that the trick is revealed, lets create our own magic trick

$ echo "<file></file>" > xml_file
$ file --mime-type xml_file
xml_file: text/plain

$ echo "<?xml version="1.0"?><file></file>" > xml_file
$ file --mime-type xml_file
xml_file: text/xml

Useful Links

  1. https://www.baeldung.com/linux/file-mime-types
  2. https://unix.stackexchange.com/questions/185216/file-command-apparently-returning-wrong-mime-type
  3. https://stackoverflow.com/questions/29017725/how-do-you-change-the-mime-type-of-a-file-from-the-terminal

Thinking about the next step as a Python Developer

Disclaimer: The analysis is not a bullet proof analysis. Despite writing code, and throwing around numbers, this might still be a random observation. Stack Overflow Job board is a not an indicator of everything. So take everything with a pinch of salt.

Python or Full Stack?

As someone who identifies as a Python Developer and uses Python as the primary language for programming, thinking about what to learn next is confusing. I primarily do “Full Stack development”, that is write backend API in Python Flask/Django and use modern JS (React/Vue) for the frontend. But recently I am seeing a disconnect between the idea of “Python Developer” and “Full Stack Developer” in the job postings. Full Stack development seems to be defined more or less synonymous to JS development. Python Job postings on the other hand are more closer to dev ops, systems development, data analytics and machine learning.

A simple experiment

So, I devised a simple experiment to verify what I seeing is in fact something of a trend and not a random observation. Look at the job postings on Stack Overflow for skill demand and use that as an indicator.

  • Go to Stack Overflow Jobs
  • Set the “Tech you like” to node.js139
  • Set the “Tech you like” to django + flask61
  • Set the “Tech you like” to python and “Tech you don’t like” to django+flask266

huh, what?

Web Development in NodeJS is more in demand than Web Development in Python and Python’s general purpose demand is way more higher than Python Web Development.

For the sake of simplicity, I have considered Django + Flask as the entire Python Web development.

Okay, so my observations and confusion about “Full Stack Development” becoming more and more Node.js development is in fact valid and I wasn’t imagining it.

What’s up with Python?

Now I was intrigued with finding out what’s happening to Python. What are those 266 other listings? Where is the demand for Python coming from? Also, I am getting a little worried about continuing as a Full Stack (Python) Developer. To get a better idea of the situation, I downloaded the RSS Feed of the Job postings for Python + NodeJS, extracted the categories for each posting and created a map of the 30 most mentioned categories.

Selection_022

Each Job posting is a row with the top 30 categories as columns. If a job positing falls under a category, it is marked 1, otherwise zero. Then I created a heatmap of the correlation matrix of the table, which will tell me how the technologies are related to each other.

heatmap

Anything that has a positive correlation has the value printed in green. Armed with this data, we can make a few observations:

  1. REST and API are not correlated to Python, but to NodeJS
  2. Event Flask and Django are not correlated to REST API
  3. Python is associated with system languages like C, C++, and Java more than web technologies.
  4. Flask is associated with more technologies than Django. Especially to Micro-Services technologies like Docker and Kubernetes.

Predictions

  1. General purpose Python web development will probably go the way of Ruby-on-Rails.
  2. Python Web development’s growth will increasingly come from micro-services and API systems which will sit on top of Machine Learning / AI based services.
  3. PHP, .Net, Java all have their own full-stack definitions and job postings, but I think NodeJS will continue dominate this term.

Final Thoughts

Identifying the next thing to learn for a Python Developer doing Full Stack Development means identifying the area of focus. Taking the time to retool in Node.JS might be as good a choice as learning micro services architecture with a bit of DevOps or learning Data Science and ML. Picking a path and moving ahead is looking more of a necessity at this point.

And I am left wondering which path to choose.

Jupyter – Finding the point when a line graph crosses the threshold

A friend of came up with the problem. There are a set of points [(x, y), (x1, y1), (x2, …]. He wanted to find the points at which this line would pass the value Z less than the peak. If the maximum value is 100 and Z = 20. He wanted to find the points where it would cross y = 80.

Now there are multiple ways to solve this problem. I attempted a simple linear interpolation solution.

I don’t the solution itself to be a big thing. What I am really impressed is, how neatly I was able to present the solution using Jupyter Notebook to him.

I was able to document the solution in a step by step fashion, with visual representation of how I solved it.

Take a look

from IPython.display import display
import matplotlib.pyplot as plt

f = [100, 102, 103.5, 105.5, 106.5, 107.5, 108.5, 110]
mag = [0, 30, 40, 145.3, 166.5, 164.5, 75.79, 65.3]

fig, ax = plt.subplots()
ax.plot(f, mag)

for x,y in zip(f, mag):
    label = ax.text(x, y, y)

fig.tight_layout()

inter_fig_1

gap = 30

# 1. Find the maximum value
max_mag = max(mag)

# 2. Set the threshold value
y = max_mag - gap

ax.hlines(y, f[0], f[-1], linewidth=0.5, color="cyan")
display(fig)

inter_fig_2

max_idx = mag.index(max_mag)

# 4. Find the left and right values which are lower than the "y" you are looking for
left_start_idx = None
left_end_idx = max_idx
right_start_idx = max_idx
right_end_idx = None

for i in range(max_idx):
    left_idx = max_idx - i
    right_idx = max_idx + i

    # if left index is more than Zero (array left most is 0) and left is not yet set
    if left_idx >= 0 and not left_start_idx:
        value = mag[left_idx]
        # if the value is lower than our threshold then pickup the point
        # and the one next to it
        # that will form our segment to interoploate
        if value < y:  
            left_start_idx = left_idx
            left_end_idx = left_idx + 1


    # if the right index is less than our array size (0..N) and right is not yet set
    if right_idx < len(mag) and not right_end_idx:
        value = mag[right_idx]
        if value < y:
            right_end_idx = right_idx
            right_start_idx = right_idx - 1

if not right_end_idx:
    print("Cannot find point on the right lower than %d" % (y))

if not left_start_idx:
    print("Cannot find point on the left lower than %d" % (y))

# Plotting the lines we will be interpolating

if left_mag and right_mag:
    ax.plot(
        [f[left_start_idx], f[left_end_idx]],
        [mag[left_start_idx], mag[left_end_idx]],
        color='red'
    )
    ax.plot(
        [f[right_start_idx], f[right_end_idx]],
        [mag[right_start_idx], mag[right_end_idx]],
        color='red'
    )

display(fig)

inter_fg_3

Now Let us use the line equation

\frac{y - y1}{x - x1} = \frac{y2 - y1}{x2 - x1}

Solving for x we get

x = x1 + (x2 - x1) \frac{y - y1}{y2 - y1}

# Left point interpolation

y1 = mag[left_start_idx]
y2 = mag[left_end_idx]
x1 = f[left_start_idx]
x2 = f[left_end_idx]

x = x1 + (x2 - x1) * (y - y1) / (y2 - y1)

ax.scatter([x], [y], color="green")
display(fig)

inter_fig_4

# Right point interpolation

y1 = mag[right_start_idx]
y2 = mag[right_end_idx]
x1 = f[right_start_idx]
x2 = f[right_end_idx]

x = x1 + (x2 - x1) * (y - y1) / (y2 - y1)

ax.scatter([x], [y], color="green")
display(fig)

inter_fig_5

I was able to export the whole thing as a PDF and send it to him.

Building a quick and dirty data collection app with React, Google Sheets and AWS S3

Covid-19 has created a number of challenges for the society that people are trying to solve with the tools they have. One such challenge was to create an app for data collection from volunteers for food supply requirements for their communities.

This needed a form with the following inputs:

  1. Some text inputs like the volunteer’s name, his vehicle number, address of delivery..etc.,
  2. The location in geographic coordinates so that the delivery person can launch google maps and drive to the place
  3. A couple of photos of the closest landmark and the building of delivery.

Multiple ready made solutions like Google Forms, Zoho Forms were attempted, but we hit a block when it came to having a map widget which would let the location to be picked manually, and uploading photos. After an insightful experience with CovidCrowd, we were no longer in a mood to build a CRUD app with Database, servers..etc., So the hunt for low to zero maintenance solution resulted in a collection of pieces that work together like an app.

Piece 1: Google Script Triggers

Someone has successfully converted a Google Sheet into a writable database (sort of) with some Google Script magic. This allows any form to be submitted to the Google Sheet and the information would be stored in the columns like in a Database. This solved two issues, no need to have a database or a back-end interface to access the data.

Piece 2: AWS S3 Uploads from Browser

The AWS JavaScript SDK allows direct upload of files into buckets from the browser using the Congnito Credentials and Pool ID. Now we can upload the images to the S3 bucket and send the URLs of the images to the Google Sheet.

Piece 3: HTML 5 Geolocation API and Leaflet JS

Almost 100% of this data collection is going to happen via a mobile phone, to we have a high chance of getting the location directly from the browser using the browser’s native Geolocation API. In a scenario where the device location is not available or user has denied location access, A LeafletJS widget is embedded in the form with a marker which the user can move to the right location manually. This is also sent to the Google Sheets as a Google Maps URL with the Lat-Long.

Piece 4: Tying it all together – React

All of this was tied together into a React app using React hook form with data validation and custom logic which orchestras the location, file upload ..etc., When the app it built it results in a index.html file and a bunch of static CSS and JS files which can be hosted freely as Github Pages or in an existing server as a subdirectory. Maybe even server over a CDN gzipped files, because there is nothing to be done on the server side.

We even added things like image preview in the form so the user can see the photos he is uploading on the form.

resource_form

Architecture Diagram

resource_form_architecture

Caveats

  1. Google Script Trigger Limits – There is a limit to how many times the Google Script can be triggered
  2. AWS Pool ID exposed – The Pool ID of with write capabilities is exposed to the world. If there is someone smart enough and your S3 bucket could become their free storage or if you have enabled DELETE access, then lose your data as well.
  3. DDOS and Spam – There are also other considerations like Spamming by watching the Google Script trigger or DDOS by triggering with random requests to the Google Script URL that you exhaust the limits.

All of these are overlooked for now as the volunteers involved are trusted and the URL is not publicly shared. Moreover the entire lifetime of this app might be just a couple of weeks. For now this zero maintenance architecture allows us to collect custom data the we want.

Conclusion

Building this solution showed me how problems can be solved without having to write a CRUD app with a admin dashboard every time. Sometimes a Google Sheet might be all that we need.

Source Code: https://github.com/tecoholic/ResourceForm

PS Do you know Covid19India.org is just a single Google Sheet and a collection of static files on Github Pages? It servers 150,000 to 300,000 visitors at any given time.

Simplifying a Factory Pattern function that has grown complex

This is a combination of the problem that I posted in Dev.to and StackExchange and the final solution that I adopted.

The Problem

I have a function which takes the incoming request, parses the data and performs an action and posts the results to a webhook. This is running as background as a Celery Task. This function is a common interface for about a dozen Processors, so can be said to follow the Factory Pattern. Here is the psuedo code:

processors = {
    "action_1": ProcessorClass1, 
    "action_2": ProcessorClass2,
    ...
}

def run_task(action, input_file, *args, **kwargs):
    # Get the input file from a URL
    log = create_logitem()
    try:
        file = get_input_file(input_file)
    except:
        log.status = "Failure"

    # process the input file
    try:
        processor = processors[action](file)
        results = processor.execute()
    except:
        log.status = "Failure"

    # upload the results to another location
    try:
        upload_result_file(results.file)
    except:
        log.status = "Failure"

    # Post the log about the entire process to a webhoook
    post_results_to_webhook(log)

This has been working well for most part as the the inputs were restricted to action and a single argument (input_file). As the software has grown, the processors have increased and the input arguments have started to vary. All the new arguments are passed as keyword arguments and the logic has become more like this.

try:
    input_file = get_input_file(input_file)
    if action == "action_2":
       input_file_2 = get_input_file(kwargs.get("input_file_2"))
except:
    log.status = "failure"


try:
    processor = processors[action](file)
    if action == "action_1":
        extra_argument = kwargs.get("extra_argument")
        results = processor.execute(extra_argument)
    elif action == "action_2":
        extra_1 = kwargs.get("extra_1")
        extra_2 = kwargs.get("extra_2")
        results = processor.execute(input_file_2, extra_1, extra_2)
    else:
        results = processor.execute()
except:
    log.status = "Failure"

Adding the if conditions for a couple of things didn’t make a difference, but now almost 6 of the 11 processors have extra inputs specific to them and the code is starting to look complex and I am not sure how to simplify it. Or if at all I should attempt at simplifying it.

Something I have considered:
1. Create a separate task for the processors with extra inputs – But this would mean, I will be repeating the file fetching, logging, result upload and webhook code in each task.
2. Moving the file download and argument parsing into the BaseProcessor – This is not possible as the processor is used in other contexts without the file download and webhooks as well.

The solution

I solved it by making two important changes:

  1. Normalised the processor’s by making the common arguments positional and everything else keyword based. This allows me to pass the kwargs as I receive them without unpacking. It is the processor’s job.
  2. For the extra files, make a copy of the kwargs and replace the remote file url with the local file location. This way, the extra files are a part of the kwargs dict itself.
def run_task(action, input_file, *args, **kwargs):

    params = kwargs.copy()

    # Get the input file from a URL
    log = create_logitem()
    try:
        file = get_input_file(input_file)
        if action == "action_2":
           params["extra_file"] = get_input_file(kwargs["extra_file"]  # update the files in params
    except:
        log.status = "Failure"

    # process the input file
    try:
        processor = processors[action](file)
        results = processor.execute(**params)   # Unpack and pass the params
    except:
        log.status = "Failure"

    # upload the results to another location
    try:
        upload_result_file(results.file)
    except:
        log.status = "Failure"

    # Post the log about the entire process to a webhoook
    post_results_to_webhook(log)

Now I have the same lean structure as I originally had. The only processor specific code is the file downloads which I think I can live with for now.

Credits

Kain0_0‘s answer pointed me in the right direction and helped me simplify it in a way that makes sense.

Employing VueJS reactivity to update D3.js Visualisations – Part 2

In Part 1, I wrote about using Vue’s reactivity directly in the SVG DOM elements and also pointed out that it could become difficult to manage as the visualisation grew in complexity.

We used D3 utilities for computation and Vue for the state management. In this post we are going to use D3 for both computation and state management with some help from Vue.

Let us go back to our original inverted bar chart and the code where we put all the D3 stuff inside the mounted() callback.

I am going to add a button to the interface so we can generate some interactivity.

<template>
  <section>
    <h1>Simple Chart</h1>

    <button @click="updateValues()">Update Values</button>

    <div id="dia"></div>
  </section>
</template>

… and define the updateValues() inside the methods in the script

export default {
  name: 'VisualComponent`
  data: function() {
    return {
      values: [1, 2, 3, 4, 5]
    }
  },
  mounted() {
    // all the d3 code in here
  },

  methods: {
    updateValues() {
      const count = Math.floor(Math.random() * 10)
      this.values = Array.from(Array(count).keys())
  }

}

Now, every time the button is clicked, a random number of elements (0 to 10) will be set to the values property of the component. Time to make the visualization update automatically. How do we do that?

Using Vue Watchers

Watchers in Vue provide us a way track changes on values and do custom things. We are going to combine that with our knowledge of D3’s joins to update out visualization.

First I am going to make a couple of changes so we can access the visualization across all the functions in the component. We currently have this

 mounted() {
    const data = [1, 2, 3, 4, 5]
    const svg = d3
      .select('#dia')
      .append('svg')
      .attr('width', 400)
      .attr('height', 300)

    svg
      .selectAll('rect')
      .data(data)
      .enter()
      ...
 }
  1. We are going to remove the data and replace it with this.values. This will allow us to access the data anywhere from the visualization
  2. We are going to track the svg as a component data value instead of a local constant.
  ...
  data: function() {
    return {
      values: [1, 2, 3, 4, 5],
      svg: null  // property to reference the visualization
    }
  },
  mounted() {
    this.svg = d3
      .select('#dia')
      .append('svg')
      .attr('width', 400)
      .attr('height', 300)

    this.svg
      .selectAll('rect')
      .data(this.values)
      .enter()
      ...

Now we can access the data and the visualization from anywhere in the Vue Component. Let us add a watcher that will track the values and update the visualization

export default {
  ...
  watch: {
    values() {
      // Bind the new values array to the rectangles
      const bars = this.svg.selectAll('rect').data(this.values)

      // Remove any extra bars that might be there
      // We will use D3's exit() selection for that
      bars.exit().remove()

      // Add any extra bars that we might need
      // We will use D3's enter() selection for that
      bars
       .enter()
       .append('rect')
       .attr('x', function(d, i) {
         return i * 50
       })
       .attr('y', 10)
       .attr('width', 25)
       .attr('fill', 'steelblue')
       // Let us set the height for both existing and new bars
       .merge(bars)
       .attr('height', function(d) {
         return d * 50
       })

    }
  }
}

There we have it – a visualization that will update based on the user’s interaction.

Updating_D3_with_Vue

Notes

  1. If we compare this technique to the previous one, it does seem like we are writing more verbose JavaScript than necessary. But if you had written D3 at all, you would find this verbose JS better to manage than the previous one.
  2. Performance – One concern when switching from Vue’s direct component reactivity to DOM based updates using D3 is the performance. I don’t have a clear picture on that matter. But the good thing is, D3’s update mechanism changes only what is necessary similar to that of Vue’s update mechanism. So I don’t think we will be very far when it comes to performance.
  3. One important advantage of this method is we can make using the animation capabilities that comes with D3js

Employing VueJS reactivity to update D3.js Visualisations – Part 1

In the previous post I wrote about how we can add D3.js Visualizations to a Vue component. I took a plain HTML, JavaScript D3.js viz and converted it to a static visualization inside the Vue component.

One of the important reasons why we use a modern framework like VueJS is to build dynamic interfaces that react to user inputs. Now in this post, let us see how we can leverage that to create dynamic visualisations that react to changes in the underlying data as well.

Points to consider

Before we begin let us consider these two points:

  1. VueJS components are not DOM elements, they are JS objects that are rendered into DOM elements
  2. D3.JS directly works with DOM elements.

So what this means is that, we can manipulate the DOM (which the user is seeing) using either Vue or D3. If the DOM elements of our visualisation is created using Vue then any changes to the underlying data would update the DOM automatically because of Vue’s reactivity. On the other hand, if we created the DOM elements using D3, then we will have to update them with D3 as well. Let’s try both.

Using Vue Directly

Let us take our simple inverted bar chart example.

simple_d3_chart

Here the output SVG will be something like this:

inv_bar_dom

We have created one rectangle per data point, with its x position and the height calculated dynamically using D3. Let us replicate the same with Vue.

I am going to change the template part of the component:

<template>
  <section>
    <h1>Simple Chart</h1>

    <div id="dia">
      <svg width="400" height="300">
        <g v-for="(value, index) in values" :key="value">
          <rect
            :x="index * 50"
            y="10"
            width="25"
            :height="value * 50"
            fill="steelblue"
          ></rect>
        </g>
      </svg>
    </div>

  </section>
</template>

The important lines to note are the following:

  1. <g v-for... – In this line we loop through the data points with g tag acting as the container (like a div)
  2. :x="index * 50" – Here we are calculating the position of the rectangle based on the index of the value
  3. :height="value * 50" – Here we calculate the height of the rectangle based on the value.

With this we can write our script as:

export default {
  name: 'VisualComponent',
  data: function() {
    return {
      values: [1, 2, 3, 4, 5]
    }
  }
}

Now this would have created the same exact chart. If these values were ever to change by user interaction then the bar chart would update automatically. We don’t even need D3.js at this point. This also will allow us to do cool things like binding Vue’s event handlers (eg., @click) with SVG objects.

But here is the catch, this works for simple charts and for examples. Or real visualization will be much more complex with Lines, Curves, Axis, Legends ..etc., trying to create these things manually will be tedious. We can make it easier to a certain degree by using D3 utilities inside computed properties like this:

import * as d3 from 'd3'

export default {
  ...

  computed: {

    paths() {
      const line = d3.line()
        .x(d => d.x)
        .y(d => d.y)
        .curve(d3.curveBasis)
      return values.map(v => line(v))
    }

  }
  ...
}

and use it like this:

<template>
...

    <g v-for="path in paths">
      <path :d="path" stroke-width="2px" stroke="blue></path>
    </g>

...

This way we are converting the values into SVG Path definitions using D3 and also using Vue’s reactivity to keep the paths updated according to the changes in data.

This improvement will also become unusable beyond a certain limit, because:

  1. We are not just thinking about the “what I need” of the visualization, we are also thinking about the “how do I” part for the “what I need” parts. This makes the process excessively hard. Almost negating the purpose D3.
  2. This will soon become unmanageable because the binding between the data and the visual is spread between the DOM nodes inside “ and the computed properties and methods. This means any updates will need working in two places.

For these reasons, I would like to keep the let everything be controlled by D3.js itself. How do I do that? Read it here in Part 2

Adding D3.js Visualisations to VueJS components

D3.JS is an amazing library to create data visualizations. But it relies on manipulating the DOM Elements of the web page. When building a website with VueJS we are thinking in terms of reactive components and not in terms of static DOM elements. Successfully using D3.js in Vue components is dependent on our clear understanding of the the Vue life cycle. Because at some point the reactive component becomes a DOM element that we see in the browser. That is when we can start using D3.js to manipulate our DOM elements.

Let us start with a simple example.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Simple Example</title>
    <a href="https://d3js.org/d3.v5.min.js">https://d3js.org/d3.v5.min.js</a>
</head>
<body>
    <h1>Simple Example</h1>
    <div id="dia"></div>

    <script>
        const data = [1, 2, 3, 4, 5]
        var svg = d3.select('#dia')
          .append('svg')
          .attr('width', 400)
          .attr('height', 300)

        svg.selectAll('rect')
          .data(data)
          .enter()
          .append('rect')
          .attr('x', function(d, i) {
              return i * 50
          })
          .attr('y', 11)
          .attr('width', 25)
          .attr('height', function(d) {
              return d * 50
          })
          .attr('fill', 'steelblue')

    </script>
</body>
</html>

Now this will give us a inverted bar graph like this:

simple_d3_chart

Doing the same in a Vue Component

The first step is to include the d3.js library into the project.

yarn add d3 
# or npm install d3

Then let us import it to our component and put our code in. The confusion starts with where do we put it the code in. Because we can’t just put it into the “ tag like in a HTML file. Since Vue components export an object, we will have to put the code inside one of the object’s methods. Vue has a number of lifestyle hooks that we can use for this purpose like beforeCreate, created, mounted..etc., Here is where the knowledge of Vue component life-cycle comes useful. If we see the the life-cycle diagram from the documentation, we can see that when the full DOM becomes available to us and the mounted() callback function is called.

vue_cycle_mounted

So, mounted() seems to be a good place to put out D3.js code. Let us do it.

<template>
  <section>
    <h1>Simple Chart</h1>
    <div id="dia"></div>
  </section>
</template>

<script>
import * as d3 from 'd3'

export default {
  name: 'VisualComponent',
  mounted() {
    const data = [1, 2, 3, 4, 5]
    const svg = d3
      .select('#dia')
      .append('svg')
      .attr('width', 400)
      .attr('height', 300)

    svg
      .selectAll('rect')
      .data(data)
      .enter()
      .append('rect')
      .attr('x', function(d, i) {
        return i * 50
      })
      .attr('y', 10)
      .attr('width', 25)
      .attr('height', function(d) {
        return d * 51
      })
      .attr('fill', 'steelblue')
  }
}
</script>

<style></style>

Now this shows the same graph that we saw in the simple HTML page example.

Next

  1. How to use Vue’s reactivity in D3.js Visualizations in Vue Components? – Part 1
  2. How to use Vue’s reactivity in D3.js Visualizations in Vue Components? – Part 2

Lottie – Amazing Animations for the Web

15549-no-wifi

Modern websites come with some amazing animations. I remember Sentry.io used to have an animation that showed packets of information going through a system and it getting processed in a processor.etc., If you browse Dribble you will see a number of landing page animations that just blow our mind. The most mainstream brand that employs animations is Apple. Their web page was a playground when they launched Apple Arcade.

Sidenote: Sadly all these animations vanish once the pages are updated. It would be cool if they could be saved in some gallery when we can view them at later points in time.

We were left wondering how do they do it?

animation_discussion

I might have found the answer to this. The answer could be Lottie.

What is Lottie? The website says

A Lottie is a JSON-based animation file format that enables designers to ship animations on any platform as easily as shipping static assets. They are small files that work on any device and can scale up or down without pixelation.

Go to their official page here to learn more. It is quite interesting.

Take a peek at the gallery as well, there are some interesting animations that can be downloaded and used in websites for free as well.