Posts

Notes on a kind of string equivalence

The problem I had been batteling with started when a friend was asking for me to write a program that could match the headers of differnt columns in some CSV file. They would be really similar but not so similar that casting both files headers to uppercase would resolve it. An example might be “CurveConfig1”, “Curve Config 1”, “curve_config1” and “curveconfig1”. My first ideas was that I could convert these down to the last format in that list. You would have to convert both lists of headers to the finnaly format as you can't go back the other way. Once they are all in this 'normalised' format they can be compared directly. The following is from my notes at the time: Question  Does encoding down to the lower information state make the comparison less reliable and should there be a hierarchical, not quite probability level, value stating the confidence of the comparison? If we did then:   “CurveConfig1” -> “CurveConfig1” would be   “Curve Config 1” -> “Cu

Git branch descriptions

Image
I often have quite a few branches open and although my branchList command has stood me in good sted for a long time now I would like to add notes or descriptions to a branch. alias branchList='git for-each-ref --sort=committerdate refs/heads/ --format='\''%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(color:red)%(objectname:short)%(color:reset) - %(content color:green)%(committerdate:relative)%(color:reset))'\''' I don't really want to make the branch names overly long. I find that the commit messages aren't always good enough to tell me the purpose of the branch. I guess this might just be bad practice on my part that the full information around why I created a branch cant be derived from the name and last commit message alone. Recently I found out a way to add descriptions to branches so that I can add more information as to why I would be creating a branch and what the full scope of the work I was doing on that branch was. This

Why even the most advanced AI might not be as useful as you think

Say you had been able to make a general AI. Given the level of intelligence, it has you would have thought booking you a holiday would be a simple task. And it would be if it weren’t for robots.txt and the terms and conditions websites have against non-human agents using their services. Ryanair for example forbids booking using any automated system. That means that although your AI is quite capable of booking your holiday it would actually break the terms and conditions of your holiday for it to do so. Will this be an issue? It does cause an issue, for the time being, lots of companies have to employ low paid workers to crawl through sites in a way that a simple spider program could do with far more efficiency just because the websites that would be scraped forbid that scrapers are used. It doesn’t matter if the websites know that the agent opening their website is a human or not, there will always be ways to fake being human online. The issue arises more from the legal challenges that

Solar atoms

Image
  I recently got a request from a friend to write a post on why this is obviously silly. I’m actually not sure why it is that silly. So if you take two diagrams both showing circles orbiting circles and you are told that one is the thing that contains us and the other is the thing that makes us, it isn’t too hard a stretch of the imagination to point at a circle in one image and ask if there is an analogous circle in the other image. I believe this skill of analogy is one of the things that facilitates human ingenuity.  For, by analogy, you could then think; if one and another is alike perhaps the pattern continues. After all what are those circles in the diagram of an atom made of? Could they too be made of smaller things? This is similar to looking at the map of the world and asking “Did those two ever fit together?” It’s worth mentioning here a favourite quote of mine: (One of my fav quotes) “All models are wrong but some are useful” In the two mode

Refactoring can be frustrating.

As I read through code I often rewrite it as I go, I find this helps me to understand it better. A bit like how doodling has been shown to help people remember stuff. I also take notes as extra comments as I go. Often after having gone through a bag of code I find that some of the edits I have made improve it. Ideally I could just create a new branch, merge request, go through the process and eventually get that code merged. One issue that comes up, and comes up particularly often on large refactor work is conflicts and conflict management. Now pycharm has a great ability to resolve a lot of conflicts but if the refactor is large enough to be moving whole collections of modules to a new architecture then simple side by side comparison of the changes doesn’t work. It can sometimes be hard to see how to integrate the new work. Alongside this is the fact that for large architecture sized refactor projects the process to get merge approval can take a long time,

Downloads manager

Image
Perhaps the best lunch-hour-python-project I have seen is that of the downloads folder manager. I originally saw this in a YouTube video of short python projects which unfortunately I haven’t been able to find again to link to here. I think almost everyone has a messy downloads folder, we download images, pdfs, programs and numerous other things that create a massively long list of items which can make it hard when going back to find things. I usually kept the folder ordered by most recently which worked quite well as a heuristic for what I wanted. The idea behind the downloads folder manager is that it watches for a file to get downloaded then based on some rules moves it away into one of a set of folders. As it’s a simple python script the whole thing is completely customisable and it’s easy to pick what folders you create. I am currently using the following script to manage mine: This sorts the downloaded files into the following categories; images, archi