Personal Bookmarking using YACY & yacy-it

A recent post on HackerNews titled Ask HN: Does anybody still use bookmarking services? caught my attention. Specifically, the top response which mentioned a distributed Search Engine YACY.

The author of the post mentions how, he has configured it to be a standalone personal search engine. The workflow is something like this:

  1. Browse the web
  2. Come across an interesting link that you need to bookmark
  3. Add the URL to the YACY Crawler and crawl to the depth=0, which crawls just that page and indexes it.
  4. Next time you need it, just search for any word that might be present on the page.

This is brilliant, because, I don’t have to spend time putting it the right folder (like in browser bookmark) or tagging it with right keywords (as I do in Notion collections). The full text indexing takes care of this automatically.

But, I do have to spend time adding the URL to the YACY Crawler. The user actions are:

  • I have to open http://localhost:8090/CrawlStartSite.html
  • Copy the URL of the page I need to bookmark
  • Paste it in the Crawling page and start the crawler
  • Close the Tab.

Now, this is high friction. The mental load saved by not tagging a bookmark is easily eaten away by doing all of the above.


Since I like the YACY functionality so much, I decided I will reduce the friction by writing a Firefox Browser Extension –

This extension uses the YACY API to start crawling of the current tab’s URL which I click the extension’s icon next to the address bar.

Note: If you notice error messages when using the addon, you might have to configure YaCy for CORS headers as described here

Add pages to YaCy index directly from the address bar
Right-click on a Link and add it to YaCy Index
If you running YaCy on the cloud or in a different computer on the network, set it in the Extension Preferences

Tip – Search your bookmarks directly from the address bar

You can search through YaCy indexed links from your addressbar by added the YaCy as a search engine in Firefox as describe here =>

  1. Go to Setting/Preferences => Search and select “Add search bar in toolbar
  2. Now Go to the YaCy homepage at http://localhost:8090
  3. Click the “Lens” icon to open the list of search engines
  4. This should now show the YaCy icon with a tiny + icon. Click that to add it as a search engine.
  5. Go back to search settings and select “Use the address bar for search and navigation” to hide the search box
  6. Scroll down to Search shortcuts -> double click the Keyword column next to the Yacy and enter a keyword eg., @yacy or @bm
  7. Now you can search Yacy from the address bar like @yacy <keyword> or @bm <keyword> to look through your bookmarks.

Podcast Notes: Darren Palmer on the EV Revolution


I listen to podcasts while doing chores like cooking, travelling in public transport, doing dishes…etc., In most cases, I just absorb what I can with the partial attention I provide to the podcast. Recently I thought, I would listen to some of them with a bit more attention, like a lecture and take some notes. I learnt a lot in this one from how EV range anxiety is a non-problem to wineries hedge their quality of wine my mixing grapes from different fields.


This one is a conversation between Bloomberg Opinion columnist Barry Ritholtz speaks and Darren Palmer, Ford Motor Co.’s vice president for electric vehicle programs. They take about Darren Palmer’s professional journey in Ford and Ford’s electric vehicle lineup and the future of Ford and Electric vehicles.

Ford Electric Vehicles

  • 11 Billion starting investment
  • 50 billion in current investment
  • use lessons from startups for velocity of execution
  • travelled to EV rich countries like Norway to China to learn
  • Interesting case where an EV user declined 100% refund of the EV vehicle and a BMW car to get rid of the EV and switch to petrol car, because he was holding the future and he doesn’t want to go back
  • focus on millennial market + Mustang brand
  • lack of Operating System is what was the limiting factor once the design was completed
  • using web technologies for creating the interface provided the team with great velocity
  • UI was hosted by a developer sitting at home, which was live updated during a test run based on customer complaint
  • from the first day, social media is being watched and feedback followed up
  • Team Edison – has a very flat structure
  • Ask what they want and move out of their way
  • fast track approvals and let them do their jobs

Converting Petrol Heads

  • Mustang club wasn’t comfortable and said they won’t be endorsing the EV Mustang
  • end of the launch presidents of both the Mustang clubs had purchased multiple vehicles
  • realisation – EV provides complementary experience with Petrol vehicles
  • petrol heads are not petrol heads, but performance heads and electric vehicles can deliver better performance
  • EV Mustang has such a performance that the only thing needed to be done to convert a person into EV is just getting them into the EV vehicle
  • tests for the vehicles are no longer done with humans because the acceleration limits are too high
  • torture tests for Ford vehicles on YouTube

E Transit – Vans

  • Focus on commercial customers
  • Payback starts from day 1
  • running costs are about half
  • extra mileage is limited because of predictability of routes – so EV mileage anxiety is not an issue
  • trips are planned and executed with right charging window
  • only 10% of our market
  • everyday seeing new use-cases in the commercial sector

Use-case #1: French Winery

  • wineries collect grapes from different parcels of land and mix them to get an average better wine (if 1 of 3 is bad), they still get a decent wine because of the 2 good parcels
  • but they don’t know if they have had 2 of 3 bad until a year later when they pull samples from the vats
  • a winery wanted to get the best quality and decided to use 1 vat per parcel, so at t
  • vat sizes are big and restriction on building above ground, so winery built the vats under ground by a mountain side
  • used the diesel vehicles for transportation
  • converted entire fleet to the electrical after seeing electrical performance
  • talks to American group of wine makers
  • now there is a huge demand from the wineries for electrical vehicles

USE-CASE #2: Mobile kitchens

  • popup kitchens with electric equipment
  • uses the battery to drive to people’s homes, plug-in and start cooking
  • battery operated electrical vehicles can use the battery to power the kitchen

BlueOval Charging Network

  • have the resources to setup independent network
  • but poor value for customers with each manufacturer setting up their own networks – so created a coalition
  • there are regional networks
  • remote monitoring of all stations
  • payment systems, terminals, ..etc all have to play along
  • they have instrumented vehicles, roaming the network, just to test the charging stations
  • problems identified are communicates to the CEOs of the stations of the network
  • problematic stations can be removed from the network if the quality problems aren’t addressed

Long charge times

  • perception is very different
  • charge in iPhone is bad compared to a flip phone, still no-one wants to switch back to a flip phone after using an iphone
  • the process of going to get petrol is obsolete
  • it actually takes 30 seconds to charge the car – because all you do is just plugin
  • daily charge is almost never conscious – go home, plugin and walk away, its charged and ready to go when you come back to it again
  • tech to charge faster is still developing
  • but it makes less difference than we think because the human element plays a very big role in when/how we charge
  • with 300 mile cars (current range of Ford EVs), it’s not actually a requirement
  • users will take a break after driving for a couple of hours (~200 miles), pit stops are more than necessary to replenish the charge
  • large miles like 800-1000 miles

Recycling and reusablity

  • materials are really in demand
  • if they don’t have access to minerals for batteries now, they are going to be in trouble
  • Blue City (Blue Ocean City? Blue Oval City??) – vertical integrated city for battery integration
  • collection of leftover and recycling to be more efficient in production

Buttons in the Cars

  • hardware switches/buttons are difficult to modify once installed
  • software provides the ability to change and modify
  • context level buttons – adaptable interface based on the context
  • parking camera buttons are not needed at 60 miles/hr – UI can hide them
  • smart vehicles can remember the way you park, so it will auto launch camera when on parking mode and can remember which camera is more frequently used and launch it automatically
  • smart UI with sensible human overrides
  • we think we need a button until we are presented with a better experience
  • no cycling around the cameras pressing the same button again and again

Customer education

  • basic problems because customers completely ignore the instructions
  • they just buy the car and use it without any research
  • so gamification kind of notifications to teach customers about nuances of an EV

Clean Python compiled files (.pyc) using py3clean

Recently ran into some issues with Python compiled files __pycache__ and *.pyc files not getting deleted when doing git checkout. The files have been created when I mounted the folder as a volume in a docker container and had different rights than the current user. So, I needed to use sudo to remove them recursively in the project.

That’s when I learnt of this cool new tool called py3clean. Simply run py3clean <folder> and it will remove all the Python compiled files recursively.

Huawei – HarmonyOS & LiteOS combo system

I recently bought a smart watch that will let me go for a run without 350 grams of electronics (my phone) jiggling in my pocket at every step. It is a Honor Magic Watch 2. It’s probably the cheap & beast option (costs about 12K) which supports onboard music storage. Now, I can go for a run with just watch and Bluetooth earphones 🙂

The LiteOS

Now the watch doesn’t support 3rd party apps like the pricier WearOS ones like the Samsung Galaxy Watches. How to make this thing smarter? Google lookup revealed that this thing is basically a clone of Huawei Watch 2 and they both run an operating system called LiteOS – which is a low power IoT OS developed by Huawei for its smart devices. The cool thing is, the OS is actually Open Source – (BSD-3 Licensed).

While it might be OpenSource, it still doesn’t solve “making it smarter”.

  • The watch’s source code is not open source – just the OS
  • There is no SDK to develop apps, so there is no documented way to know it’s APIs


Unsurprisingly Huawei is aware of the problem and they have been developing more modern OS called HarmonyOS, which will allow them to overcome this. HarmonyOS is being positioned as an alternate to Android for cross device integration. It runs on everything from watches, phones, tablets, to TVs.

How did Huawei develop such an OS so fast? Turns out it is just a fork of Android that has been rebranded/code obfuscated… or whatever. Basically a copy.

The Battery

One of the good thing about the LiteOS watches is their battery life. They are rated for 7 – 4 days of usage. Having used it for a week now, I will say it is kind of true. Even with GPS + Bluetooth Music on for 40 mins a day, it can still go for 4 days.

Now if Google’s WearOS can only provide a battery life of 24 – 48 hours, how can HarmonyOS which is a clone be any better?

Enter the combo OS

Huawei has done what we Linux users have been doing for decades – dual booting OS. Well, not exactly, but close. So the newer watches like Huawei Watch 3, come with both LiteOS and HarmonyOS loaded onto them.

For “smarter” requirements, HarmonyOS takes over, and for essentials, the LiteOS subs in. It is kind of a smart solution. With this combo solution, Huawei seems to bringing the best of both worlds to their watches.


The Honor watch also has a Huawei designed custom chip called Kirin which runs the LiteOS. So they could be going Apple’s way in Chip + Kernel + OS level integration.

Goodbye! Brave

Final Update (moved old updates to bottom)

Brave supports Chrome extensions. The problem was with the author’s version of Brave; it was roughly a year old. Very old versions of Brave didn’t include service keys (necessary for interacting with Brave’s privacy-preserving proxy-service), whereas modern versions do (which is why you and I are able to install extensions without any issue)

Sampson from Brave

To explain – the place I have installed Brave hasn’t made available any newer versions of the browser since December 2020. So the keys it shipped with have become outdated. Since no-update was available, I didn’t see the usual orange “Update” button on the taskbar.

⚠️ NOTICE: Closing comments as they have moved from discussing the issue to attacking me for not being crypto friendly.

Original Post

I have been using the Brave Browser for almost 2 years I think. @logic introduced it to me at some point and it has been my primary browser both in Desktop and Mobile, home and office computers since then.

I got my first heads up when I came across a post on HackerNews about Brave misbehaving due to the “Brave backend servers” being unreachable. It struck me as strange when a comment on the Github ticket mentioned that Brave servers need to be up for Brave to function.

This is a big design NO-NO for something as essential as a web browser. But then, the inertia of it being a daily driver, its amazing ad-blocking and tracker protection, Chrome extension compatibility, and the fact I haven’t faced any such issues prevented me from doing any changes.

Today I was looking to install an extension to manage the browser tabs and I ran into this

Can’t install any extension

I thought maybe the extension was buggy and tried a couple more and the same result for everything. And searching for the error led me to this Github Ticket, which again describes that it is a “server-side” issue and it was fixed.

Well, it is not fixed for me. But that’s beside the point. This amount of dependency for a browser to have on “backend servers” is ridiculous. For software, as important as a browser, through which I have come to access almost everything digital for me is unacceptable. So with this post being the last thing I will do on Brave, I will bid goodbye.

Exploring options…

  1. An interesting alternate is Vivaldi – It is trying to do what Opera was doing pre-Chrome. It rolls email, calendar, RSS reader, browser all into one and also provides built-in ad-blocking.
  2. Open source Chrome aka Chromium – This used to be my primary dirver before. So I am thinking of going back to it with the usual extensions like Ghostery, AdBlock+..etc., Not sure how much things have changed there.


Not sure who posted this in HackerNews. Thanks for all the feedback.

  1. I will be trying Firefox. So many people have recommended it. It’s something I have forgotten over the last couple of years and before that it frequently caused issues and was only my secondary browser for testing.
  2. There is nothing sinister about the decision or PR at work. I tried installing extensions, it didn’t work, I uninstalled and made a note of why I am doing it. Interpretations are all yours.

Update 2:

This is for people suggesting I jumped the gun and probably didn’t take the time to understand the real problem. I am an Chrome extension author myself, I had just published a new version of it only 8 hours before and tested installation on Brave and Chrome. So, I understand the issue. And I have linked to GitHub issues where this has been discussed.

Learning Rust – Day 1

Dev notes from learning Rust.

Official Website:


  • Rust is a compiled language – different from my daily drivers Python and JavaScript both interpreted languages.
  • Compiled means re-learning the difference between passing by reference and passing by value
  • Rust has mutability at the core of data management – so being conscious about immutable and mutable data
  • Errors are handled as a part of the Result of functions. Doesn’t need an explicit try...except (at least at this point)
  • Packages are called crates and there are binary crates & library crates
  • is the package registry
  • Cargo.lock file acts as the record of dependencies for Reproducible builds
  • cargo update updates the packages to the most recent bugfix version
  • there is something called Traits which provide access to functions of a crate.

Basics of Investing – A conversation in Tamil

A couple of weeks back, I hosted a space on Investments where Mohanakrishnan covered the first steps on an investment journey. This covered things like:

  • Emergency Fund
  • Health Insurance
  • Asset allocation and..
  • diversification

The snippets from the conversation are listed here.

Stepping into 2021

This is probably the most remarkable New Year in my life. I am finally giving up 5 years of successful (personally) freelancing and consulting work and will be a full time employee of a company.

I am joining Times Internet group for a data driven journalism team they are setting up and Ritvvij thinks I can contribute a lot and I am excited about it as much as I can be given the circumstances. **turns and glares at 2020**

Some background

You see, I saw this TED talk of Sir Tim Berners Lee way back in 2010 and got excited about OpenStreetMap and have been making maps since then. If you follow me on Twitter, then you perhaps would agree that I have had some success with it.

During 2010 – 13, my dream job was to become a data visualisation developer for a newspaper. TED Talks on that subject were watched and rewatched so many times – I miss you Hans Rosling. Unfortunately I never actually become a data visualisation expert. It remained a hobby – notice the “Visualization” on the top menu.

… and now

I think what brought attention was me trying to make NER based data extraction a commercial tool. Ritvvij DMed me to provide feedback on it and a few weeks later DMed again to hire me for Times Internet.*

Suddenly, in the last week of the crappy year that 2020 has been, I found myself holding a role that I had dreamt of almost 10 years ago.

Tell me this is not a remarkable new year.

* Well, not exactly. I did go through a two day paid test where I did some work to prove that I am suitable and impressed at least one other person with my data visualisation attempts.

some thoughts

I am happy I get to build software for things that I do as a hobby and get paid for it. I think I can call myself lucky in that sense, even though I might have to find a new hobby, the old hobby becoming work and all.

I have spent hours agonising over the frameworks I am not confident about (hello Django, hey NodeJS), the new cool features I am missing out on (what version are you now on React? 16, 17, 99?), the amazing new paradigms that are opening up (, questioned the future as a Python developer and how what I know doesn’t fit into 100s of job descriptions that I skip across. It is remarkable that a job found me without asking any of that.

This is similar to the instance when 40 lines of JavaScript made it to The Next Web and Lifehacker .

This has brought in the realisation that, while I might not be a 10x engineer who can reverse a binary tree for breakfast, I certainly can create solutions that people find valuable / useful / interesting at some level.

And for that I am happy.

Happy New Year 2021 – Cheers to New Beginnings.

Backup all the files in a directory to Azure Cloud

I had to copy all the files from the home directory into a Azure Blob container today. All the regular folders without any of the dot files and dot folders.

Azure CLI provides batch upload functionality to upload folders. But there are two issues I faced:

  1. I needed to copy all the folders – and I didn’t want to run the command for each folder.
  2. I wanted to preserve the folder structure in the container as well.

After some trial and error I settled on this one liner.

for f in */; do az storage blob upload-batch -d container-name/$f -s $f; done;

For loop take care of #1 and using the /$f takes care of creating corresponding folders to preserve the same folder structure as in my home directory.

This assumes you already have set the AZURE_STORAGE_ACCOUNT and the AZURE_STORAGE_KEY environment variables for authentication.

NER Annotator / NER Tagger for Spacy

NER Annotator is now available to use directly from the browser


As with most things, this started with a problem. Dr. K. Mathan is an Epidemiologist tracking Covid-19. He wanted to automated extraction of details from government bulletins for data collection. It was a tedious manual process of reading the bulletins and entering the data by hand. Since the bulletins has paragraphs of text with text in them, I was looking to see if I can leverage any NLP (Natural Language Processing) tools to automate parts of it.

Named Entity Recognition

The search led to the discovery of Named Entity Recognition (NER) using spaCy and the simplicity of code required to tag the information and automate the extraction. It kind of blew away my worries of doing Parts of Speech (POS) tagging and then custom writing an extraction algorithm. So, copied some text from Tamil Nadu Government Covid Bulletins and set out test out the effectiveness of the solution. It worked pretty well for the small amount of training data (3 lines) vs extracted data (26 lines).

Trying out NER based extraction in Google Colab Notebook using spaCy

But it had one serious issue. Generating Training Data. Generating training data for NER Annotation is a pain. It is infact the most difficult task in the entire process. The library is so simple and friendly to use, it is generating the training data that is difficult.

Creating NER Annotator

NER Annotation is fairly a common use case and there are multiple tagging software available for that purpose. But the problem is they are either paid, too complex to setup, requires you to create an account or signup, and sometimes doesn’t generate the output in spaCy’s format. The one that seemed dead simple was Manivannan Murugavel’s spacy-ner-annotator. That’s what I used for generating test data for the above example. But it is kind of buggy, the indices were out of place and I had to manually change a number of them before I could successfully use it.

After a painfully long weekend, I decided, it is time to just build one of my own. Manivannan’s tagger just uses JavaScript to create the training data JSON and then requires a conversion using a Python Script before it can be used. I decided to make it a little more bug proof.

This version of NER Annotator would:

  1. Use a Python backend to tokenize and detokenize text for tagging and generating training data.
  2. The UI will let me select tokens (idea copied from Prodigy from the spaCy team), so I don’t have to be pixel perfect in my selections.
  3. Generate JSON which can be directly loaded instead of having to post-process it with Python script.

The Project

I created the NER with the above goals as a Free and Open Source project and released it under MIT License.

Github Link:


Thanks to Philip Vollet noticing it and sharing it on LinkedIn and Twitter, the project has gotten about 107 stars on Github and 14 forks, which is much more reach than I hoped for.

Thanks to @1littlecoder for making a YouTube video of the tool and showing the full process of tagging some text, training data and performing extractions.