Learning Rust – Day 1

Dev notes from learning Rust.

Official Website: https://www.rust-lang.org

Notes

  • Rust is a compiled language – different from my daily drivers Python and JavaScript both interpreted languages.
  • Compiled means re-learning the difference between passing by reference and passing by value
  • Rust has mutability at the core of data management – so being conscious about immutable and mutable data
  • Errors are handled as a part of the Result of functions. Doesn’t need an explicit try...except (at least at this point)
  • Packages are called crates and there are binary crates & library crates
  • https://crates.io/ is the package registry
  • Cargo.lock file acts as the record of dependencies for Reproducible builds
  • cargo update updates the packages to the most recent bugfix version
  • there is something called Traits which provide access to functions of a crate.

Basics of Investing – A conversation in Tamil

A couple of weeks back, I hosted a space on Investments where Mohanakrishnan covered the first steps on an investment journey. This covered things like:

  • Emergency Fund
  • Health Insurance
  • Asset allocation and..
  • diversification

The snippets from the conversation are listed here.

Stepping into 2021

This is probably the most remarkable New Year in my life. I am finally giving up 5 years of successful (personally) freelancing and consulting work and will be a full time employee of a company.

I am joining Times Internet group for a data driven journalism team they are setting up and Ritvvij thinks I can contribute a lot and I am excited about it as much as I can be given the circumstances. **turns and glares at 2020**

Some background

You see, I saw this TED talk of Sir Tim Berners Lee way back in 2010 and got excited about OpenStreetMap and have been making maps since then. If you follow me on Twitter, then you perhaps would agree that I have had some success with it.

During 2010 – 13, my dream job was to become a data visualisation developer for a newspaper. TED Talks on that subject were watched and rewatched so many times – I miss you Hans Rosling. Unfortunately I never actually become a data visualisation expert. It remained a hobby – notice the “Visualization” on the top menu.

… and now

I think what brought attention was me trying to make NER based data extraction a commercial tool. Ritvvij DMed me to provide feedback on it and a few weeks later DMed again to hire me for Times Internet.*

Suddenly, in the last week of the crappy year that 2020 has been, I found myself holding a role that I had dreamt of almost 10 years ago.

Tell me this is not a remarkable new year.

* Well, not exactly. I did go through a two day paid test where I did some work to prove that I am suitable and impressed at least one other person with my data visualisation attempts.

some thoughts

I am happy I get to build software for things that I do as a hobby and get paid for it. I think I can call myself lucky in that sense, even though I might have to find a new hobby, the old hobby becoming work and all.

I have spent hours agonising over the frameworks I am not confident about (hello Django, hey NodeJS), the new cool features I am missing out on (what version are you now on React? 16, 17, 99?), the amazing new paradigms that are opening up (nocode.tech), questioned the future as a Python developer and how what I know doesn’t fit into 100s of job descriptions that I skip across. It is remarkable that a job found me without asking any of that.

This is similar to the instance when 40 lines of JavaScript made it to The Next Web and Lifehacker .

This has brought in the realisation that, while I might not be a 10x engineer who can reverse a binary tree for breakfast, I certainly can create solutions that people find valuable / useful / interesting at some level.

And for that I am happy.

Happy New Year 2021 – Cheers to New Beginnings.

Backup all the files in a directory to Azure Cloud

I had to copy all the files from the home directory into a Azure Blob container today. All the regular folders without any of the dot files and dot folders.

Azure CLI provides batch upload functionality to upload folders. But there are two issues I faced:

  1. I needed to copy all the folders – and I didn’t want to run the command for each folder.
  2. I wanted to preserve the folder structure in the container as well.

After some trial and error I settled on this one liner.

for f in */; do az storage blob upload-batch -d container-name/$f -s $f; done;

For loop take care of #1 and using the /$f takes care of creating corresponding folders to preserve the same folder structure as in my home directory.

This assumes you already have set the AZURE_STORAGE_ACCOUNT and the AZURE_STORAGE_KEY environment variables for authentication.

NER Annotator / NER Tagger for Spacy

Background

As with most things, this started with a problem. Dr. K. Mathan is an Epidemiologist tracking Covid-19. He wanted to automated extraction of details from government bulletins for data collection. It was a tedious manual process of reading the bulletins and entering the data by hand. Since the bulletins has paragraphs of text with text in them, I was looking to see if I can leverage any NLP (Natural Language Processing) tools to automate parts of it.

Named Entity Recognition

The search led to the discovery of Named Entity Recognition (NER) using spaCy and the simplicity of code required to tag the information and automate the extraction. It kind of blew away my worries of doing Parts of Speech (POS) tagging and then custom writing an extraction algorithm. So, copied some text from Tamil Nadu Government Covid Bulletins and set out test out the effectiveness of the solution. It worked pretty well for the small amount of training data (3 lines) vs extracted data (26 lines).

Trying out NER based extraction in Google Colab Notebook using spaCy

But it had one serious issue. Generating Training Data. Generating training data for NER Annotation is a pain. It is infact the most difficult task in the entire process. The library is so simple and friendly to use, it is generating the training data that is difficult.

Creating NER Annotator

NER Annotation is fairly a common use case and there are multiple tagging software available for that purpose. But the problem is they are either paid, too complex to setup, requires you to create an account or signup, and sometimes doesn’t generate the output in spaCy’s format. The one that seemed dead simple was Manivannan Murugavel’s spacy-ner-annotator. That’s what I used for generating test data for the above example. But it is kind of buggy, the indices were out of place and I had to manually change a number of them before I could successfully use it.

After a painfully long weekend, I decided, it is time to just build one of my own. Manivannan’s tagger just uses JavaScript to create the training data JSON and then requires a conversion using a Python Script before it can be used. I decided to make it a little more bug proof.

This version of NER Annotator would:

  1. Use a Python backend to tokenize and detokenize text for tagging and generating training data.
  2. The UI will let me select tokens (idea copied from Prodigy from the spaCy team), so I don’t have to be pixel perfect in my selections.
  3. Generate JSON which can be directly loaded instead of having to post-process it with Python script.

The Project

I created the NER with the above goals as a Free and Open Source project and released it under MIT License.

Github Link: https://github.com/tecoholic/ner-annotator

Credits

Thanks to Philip Vollet noticing it and sharing it on LinkedIn and Twitter, the project has gotten about 107 stars on Github and 14 forks, which is much more reach than I hoped for.

Thanks to @1littlecoder for making a YouTube video of the tool and showing the full process of tagging some text, training data and performing extractions.

Featured on TheNextWeb & Lifehacker

Something really cool happened this week. I will let the tweets to take over.

… and that’s how I made it to the homepage of TheNextWeb.

… and Lifehacker

Source code of the extension: https://github.com/tecoholic/Just-Arrived

For Chrome: Chrome Webstore

For Firefox: https://addons.mozilla.org/en-GB/firefox/addon/just-arrived-ff/

What did I learn from this?

The most important thing I learnt while doing this is probably the fact that the extension architecture is standardised across Chrome and Firefox. Thanks to Shrinivasan for asking me to port it to Firefox.

But, I think the relationship is one sided. Firefox can work with extensions written for Chrome, but Chrome won’t work with extensions written for Firefox. This is due to the nature of Firefox’s API and the fallback it offers.

For example, the storage api on Firefox is storage.* whereas on Chrome it is chrome.storage.*. Since Firefox has fallbacks for all the chrome.* API, the code primarily written for Chrome works without modifications on Firefox. But if a developer writes the plugin first for Firefox, it would lose the namespacing and therefore won’t work.

More technical details here at MDN web docs: Building a cross-browser extension

Special thanks @tshrinivasan for pushing me to build it for Firefox to @SuryaCEG for the UX advised and @IndianIdle for writing the article.

Rediscovering Web Development

Despite the rapid advancement of tools and specifications, web development has pretty much involved the same set of actions for a developer.

It all starts with an index.html file. An entire decade has been spent on just creating different ways to generate this file. From writing each line of HTML by hand to creating it on the fly using a shadow DOM we have come a long way.

Then comes the data that sits on these HTML files and followed by the interaction. The data flows from a data-source. Usually a database. The developer’s first task is to figure out a way to get this data from the database on the HTML. This is where a lot of action goes in. So much so that the quintessential phrase “Web Developer” has been replaced by “Frontend” and “Backend” developer.

But that’s all an old story, a thing of the past, a bygone era, remnants of an old civilization…. (looks dreamily into the void). Alight, I might be exaggerating a tiny bit, but I am in no way wrong.

The emergence of Low Code and No-Code Tools

As with everything these days, I stumbled into the world of low-code or no-code tools due to COVID-19. When India had only a handful of cases and people were trying to build a website to track the pandemic, I built a web app using Python Django which quickly became irrelevant and costly to run.

Instead, a live dashboard was built using Google Sheets as the database and API. A simple HTML page with some React code became the frontend. It was fascinating to see the innovative way in which the entire thing was working. At one point I saw 89 people actively editing the sheet, updating data, and 100,000 others consuming it on the dashboard. It changed the perception of web development completely.

Future of Web Development

Ever since I have been intrigued by this notion of building things on the web without the traditional idea of database-backend-frontend. While all of those things might exist, it is abstracted away in a way you are not actively working with a setting up VMs, database server, compiling code – that sort of nuts and bolts.

https://nocodelist.co/ is a website that lists tools that allow one to do that. It has tools for almost any task conceivable.

  • You want to draw the UI adjust colors and generate Angular code that you can just add business logic to – UI Bakery has you covered
  • You won’t convert your Google Sheets into a Database with a REST API – there are at least 8 apps for that
  • How about a drag and drop UI builder that you can directly hook into your database without a backend server? – AppSmith will let you do that

In each of the instances above, there is a lot of details that are abstracted away into the tool. Only the part which actually requires human decision making is left for the developers to decide.

Just as we moved from writing HTML by hand to building shadow DOMs using JavaScript, we might be moving from writing any JS to defining just the logic.

Conclusion

For a modern frontend developer, HTML is a second thought. The frameworks like React, Vue, Angular have created a way for us to think about the web in terms of components instead of HTML tags. We might soon be thinking about web apps in terms of user-flows while the frameworks will deal with the components.

You could even say, we already do and that I am late to realize it.

Strapi – Optimizing REST API responses by preventing auto-population of relations

Strapi is an Open Source headless CMS based on NodeJS. It provides the backend admin tools to quickly create an API – both REST and GraphQL.

This is a mini series which outlines:

  1. Setting up Strapi and creating an API
  2. Adding ownership control to the API endpoints
  3. Optimising REST API responses – (you are here)

So far …

We have created a REST API for an expense management application with category support. We have JWT token based auth which came with Strapi to authenticate users. We have implemented the IsOwner policy in the controllers to restrict data access.

Optimizing the Responses while

The API by default automatically populates relationships and sends in all the related data. It is very useful for some cases and completely a overkill for others. Take the following for example.

I have setup 5 Expense Items in the Admin dashboard for our test_user. 4 of them are under the category ‘Travel’ and one of them is under the category ‘Food’. Now when we fetch the categories, let us see what we get.

When making a GET request to /categories, we are not only getting the categories but also all the expense items which are under every category. When a user has thousands of expense items, we cannot be querying the DB for all of them whenever a GET request is made to categories. That would cause serious performance issues.

Preventing auto-population of relations

We can turn this off by setting the autoPopulate flag to false in the model.

  • Open the file /api/category/models/category.settings.json
  • Add the line "autoPopulate": false in the the expense-items block as shown below
  • Let us also disable auto-population of the user. We have already implemented the IsOwner policy for all requests, so only the owner is going to be requesting their own categories and the user field is redundant data.
{
  "kind": "collectionType",
  "collectionName": "categories",
  "info": {
    "name": "Category"
  },
  "options": {
    "increments": true,
    "timestamps": true
  },
  "attributes": {
    "name": {
      "type": "string",
      "required": true,
      "minLength": 2
    },
    "color": {
      "type": "string"
    },
    "user": {
      "plugin": "users-permissions",
      "model": "user",
      "via": "categories",
      "autoPopulate": false
    },
    "expense_items": {
      "via": "category",
      "collection": "expense-item",
      "autoPopulate": false
    }
  }
}

Now as soon as we save the file, the Strapi dev server should restart. Now we can run the same GET /categories request to verify the results.

There is no expense items in the response. Just the categories.

We can use this method to turn of auto population of any relation in any of the Content Types we have created. This way the API returns only what we intend it to return.

Optimising the Login response

Let us take a look at the login response.

We can see that it contains all the categories and expense items of the user. This would put disastrous load on the system as the data size grows. So, let us turn off auto-populate for the users as well.

  • Open /extensions/user-permissions/models/User.settings.json
  • Scroll to the bottom and add "autoPopulate": false to the entries categories and expense_items

Now, let us login again and check the response.

No categories or expense items in the response, just the JWT token user object and the roles. Now every time a user logs in Strapi won’t be querying the database for everything related to the user.

Conclusion

This concludes this mini series. By applying the changes presented in this series, Strapi can be used a REST API backend not just for CMS purposes with strong public frontend, but also as a good backend for User focused web applications.

In my journey as a web developer, Strapi blew my mind the same way Django did almost a decade back with its built-in Admin UI. The amount of power Strapi packs right off the box is amazing.

Strapi – Adding IsOwner Policy to the API

Strapi is an Open Source headless CMS based on NodeJS. It provides the backend admin tools to quickly create an API – both REST and GraphQL.

This is a mini series which outlines:

  1. Setting up Strapi and creating an API
  2. Adding ownership control to the API endpoints – (you are here)
  3. Optimising REST API responses

So far…

We have created models for the app and have an API setup which works with JWT Authentication without a single line of code. But we have an issue, any authenticated user can read every other user’s data.

This can be rectified using setting up an access policy in the Model’s controller file.

Side Note: Strapi’s Policies section explains how to implement them and configure them for routes here. But that doesn’t work for IsOwner policy because ownership is object specific and thus has to be implemented in the controller instead of policy configuration.

Writing the Is Owner Policy

We will be using the Create is owner policy document as our reference material to update our API. I will be repeating all of it here with a little more information.

We have two models defined so far – Category & ExpenseItem. I am going to implement the IsOwner policy for Category and leave ExpenseItem as an exercise. Now, let’s go write some code.

  • Let us open our text editor and open the api/category/controllers/category.js file
  • It should have the following content
'use strict';

/**
* Read the documentation (https://strapi.io/documentation/v3.x/concepts/controllers.html#core-controllers)
* to customize this controller
*/

module.exports = {};

All of our code will go into the braces. We will be adding 6 functions:

  1. create – the function executed when a new category is created. Here we will make sure that any newly created category is automatically assigned to the user creating the category.
  2. find – the function execute when all the objects are listed. For eg., /categories. When a user requests categories, we will filter the results such that the user receives only their’s and not others’.
  3. findOne – Same as the one above when a single object is accessed with id like /categories/1
  4. count – the count of objects in a model. We will be counting only the objects created by a particular user
  5. update – update a specific object. Only the owner should be able to do it
  6. delete – delete a specific object. Only the owner should be able to delete an object.

The above will cover the 4 CRUD operations and the 2 extra ones (listing and counting).

Create function

const { parseMultipartData, sanitizeEntity } = require("strapi-utils");

module.exports = {
/**
* Create a new Category
*
* @param {*} ctx The Request Context
*/
async create(ctx) {
let entity;

if (ctx.is('multipart')) {
const { data, files } = parseMultipartData(ctx);
data.user = ctx.state.user.id;
entity = await strapi.services.category.create(data, { files });
} else {
ctx.request.body.user = ctx.state.user.id;
entity = await strapi.services.category.create(ctx.request.body);
}
return sanitizeEntity(entity, { model: strapi.models.category });
},
};

Strapi is built on Koa.js and thus uses async await instead of callbacks. Let us break down the logic of the function and see what’s happening.

  • the function is passed in the Request context which has the all the request related information like the form data, the user identified in the request (from our JWT token)..etc.,
  • we check if it is a multipart form which would mean we have uploaded files to deal with.
  • we parse the form for data and files, set the user as the user executing the request and use the Strapi service for our model to create a new entity.
  • if it is not a multipart form, then we just set the user of the request as an extra field into the request data and create the category using the Strapi service
  • finally we need to return the new category as a response – We use strapi’s function sanitizeEntity to pass in the newly created entity and the model.

Side Note: I know the Category model doesn’t have any files attached to it and the IF block with ‘multipart’ check is not necessary in this situation. But I am leaving it here for two reasons:

  1. If in the future, we want to support logos or some form of header image for the Category model, we don’t have to come back and update the code again.
  2. This might act as a reference implementation for someone reading the blog and might use it on a model with files and don’t want them to wonder files are not getting saved.

If you really want to have a lean code base then the just 3 lines would be sufficient

module.exports = {
/**
* Create a new Category
*
* @param {*} ctx The Request Context
*/
async create(ctx) {
ctx.request.body.user = ctx.state.user.id;
let entity = await strapi.services.category.create(ctx.request.body);
return sanitizeEntity(entity, { model: strapi.models.category });
},
};

Testing the create logic

Now we can re-run the POST request to create a new category in POST and verify that the user is automatically set for the category.

strapi_new_category_with_user

Update function

Now that we have the create function auto assigning the user for the categories, let us implement restrictions for updates.

/**
* Update a category
*
* @param {*} ctx the request context
*/

async update(ctx) {
const { id } = ctx.params;

let entity;

// Find the category matching the ID and the user
const [category] = await strapi.services.category.find({
id: ctx.params.id,
'user.id': ctx.state.user.id,
});

if (!category) {
return ctx.unauthorized(`You can't update this entry`);
}

// Update the category
if (ctx.is('multipart')) {
const { data, files } = parseMultipartData(ctx);
entity = await strapi.services.category.update({ id }, data, {
files,
});
} else {
entity = await strapi.services.category.update({ id }, ctx.request.body);
}

return sanitizeEntity(entity, { model: strapi.models.category });
},

The update function adds an extra step of fetching the category and making sure that the category with that ID and userID exists before reading the request data. If the category doesn’t exist, then it returns a Unauthorized error. If it exists, then it updates the category and returns the updated information.

We can verify it with a PUT request to http://localhost:1337/categories/

strapi_update_category

Notice that the Food category has now been updated to Food & Drinks. Not just that, using the JWT token of the intruder user wouldn’t work either.

strapi_update_unauthorized

Find function

/**
* List all the categories beloinging to the requesting user
*
* @param {*} ctx the request context
*/

async find(ctx) {
let entities;

if (ctx.query._q) {
entities = await strapi.services.category.search({
...ctx.query,
'user.id': ctx.state.user.id
});
} else {
entities = await strapi.services.category.find({
...ctx.query,
'user.id': ctx.state.user.id
});
}

return entities.map(entity => sanitizeEntity(entity, { model: strapi.models.category }));

}

The find function checks if the query is a search query or a filter and calls the corresponding function. We also pass the 'user.id' of the requesting user along with other query params from the request to filter the search results. Now when we request the url http://localhost:1337/categories, the response contains only the objects of the requesting user.

strapi_get_categories_test_user

Now let us see what we get when we request as a different user

strapi_get_categories_intruder

FindOne function

/**
* Get the category with a specific ID
*
* @param {*} ctx the request context
*/
async findOne(ctx) {
const { id } = ctx.params;

const entity = await strapi.services.category.findOne({ id, 'user.id': ctx.state.user.id });

if (!entity) {
return ctx.unauthorized(`You can't view this entry`);
}

return sanitizeEntity(entity, { model: strapi.models.category });
},

Fetching the category with id=3 as the owner (test_user)

strapi_get_one_category_test_user

Trying to get test_user’s category as the intruder

strapi_get_one_category_intruder

Count function

/**
* Count of the categories of the requesting user
*
* @param {*} ctx the request context
*/

count(ctx) {
if (ctx.query._q) {
return strapi.services.category.countSearch({
...ctx.query,
"user.id": ctx.state.user.id,
});
}
return strapi.services.category.count({
...ctx.query,
"user.id": ctx.state.user.id,
});
},

Count of test user

strapi_category_count

Delete function

/**
* Delete a record
*
* @param {*} ctx the request context
*/
async delete(ctx) {
const [category] = await strapi.services.category.find({
id: ctx.params.id,
"user.id": ctx.state.user.id,
});

if (!category) {
return ctx.unauthorized(`You can't delete this entry`);
}

let entity = await strapi.services.category.delete({ id: ctx.params.id });
return sanitizeEntity(entity, { model: strapi.models.category });
},

Delete as a intruder – Unauthorized

strapi_delete_intruder

Delete as the owner – test_user

strapi_delete_test_user

Final code for the controller

Here is the complete controller code with all the functions.

'use strict';
const { parseMultipartData, sanitizeEntity } = require("strapi-utils");
/**
* Read the documentation (https://strapi.io/documentation/v3.x/concepts/controllers.html#core-controllers)
* to customize this controller
*/
module.exports = {
/**
* Create a new Category
*
* @param {*} ctx The Strapi Context
*/
async create(ctx) {
let entity;
if (ctx.is("multipart")) {
const { data, files } = parseMultipartData(ctx);
data.user = ctx.state.user.id;
entity = await strapi.services.category.create(data, { files });
} else {
ctx.request.body.user = ctx.state.user.id;
entity = await strapi.services.category.create(ctx.request.body);
}
return sanitizeEntity(entity, { model: strapi.models.category });
},
/**
* Update a category
*
* @param {*} ctx the request context
*/
async update(ctx) {
const { id } = ctx.params;
let entity;
// Find the category matching the ID and the user
const [category] = await strapi.services.category.find({
id: ctx.params.id,
"user.id": ctx.state.user.id,
});
if (!category) {
return ctx.unauthorized(`You can't update this entry`);
}
// Update the category
if (ctx.is("multipart")) {
const { data, files } = parseMultipartData(ctx);
entity = await strapi.services.category.update({ id }, data, {
files,
});
} else {
entity = await strapi.services.category.update({ id }, ctx.request.body);
}
return sanitizeEntity(entity, { model: strapi.models.category });
},
/**
* List all the categories beloinging to the requesting user
*
* @param {*} ctx the request context
*/
async find(ctx) {
let entities;
if (ctx.query._q) {
entities = await strapi.services.category.search({
...ctx.query,
"user.id": ctx.state.user.id,
});
} else {
entities = await strapi.services.category.find({
...ctx.query,
"user.id": ctx.state.user.id,
});
}
return entities.map((entity) =>
sanitizeEntity(entity, { model: strapi.models.category })
);
},
/**
* Get the category with a specific ID
*
* @param {*} ctx the request context
*/
async findOne(ctx) {
const { id } = ctx.params;
const entity = await strapi.services.category.findOne({
id,
"user.id": ctx.state.user.id,
});
if (!entity) {
return ctx.unauthorized(`You can't view this entry`);
}
return sanitizeEntity(entity, { model: strapi.models.category });
},
/**
* Count of the categories of the requesting user
*
* @param {*} ctx the request context
*/
count(ctx) {
if (ctx.query._q) {
return strapi.services.category.countSearch({
...ctx.query,
"user.id": ctx.state.user.id,
});
}
return strapi.services.category.count({
...ctx.query,
"user.id": ctx.state.user.id,
});
},
/**
* Delete a record
*
* @param {*} ctx the request context
*/
async delete(ctx) {
const [category] = await strapi.services.category.find({
id: ctx.params.id,
"user.id": ctx.state.user.id,
});
if (!category) {
return ctx.unauthorized(`You can't delete this entry`);
}
let entity = await strapi.services.category.delete({ id: ctx.params.id });
return sanitizeEntity(entity, { model: strapi.models.category });
},
};
view raw category.js hosted with ❤ by GitHub

So far …

  • Created a project
  • Added the models
  • Setup the API
  • Added the IsOwner policy to the controller

Next

Let’s do a little bit of optimisation of the API responses. If we notice the GET request responses, the relations are always fully populated. For example, if we do a GET /categories each of these categories will have the user object in it. And if you add some expense items to a category, then all of those will be returned in the GET response as well. We will try to reduce this a bit and make it more streamlined in the next part.