The problem I had been batteling with started when a friend was asking for me to write a program that could match the headers of differnt columns in some CSV file. They would be really similar but not so similar that casting both files headers to uppercase would resolve it. An example might be “CurveConfig1”, “Curve Config 1”, “curve_config1” and “curveconfig1”. My first ideas was that I could convert these down to the last format in that list. You would have to convert both lists of headers to the finnaly format as you can't go back the other way. Once they are all in this 'normalised' format they can be compared directly. The following is from my notes at the time: Question Does encoding down to the lower information state make the comparison less reliable and should there be a hierarchical, not quite probability level, value stating the confidence of the comparison? If we did then: “CurveConfig1” -> “CurveConfig1” would be “Curve Config ...