What’s not in a name?
Although names are merely a way to identify nouns, they can be problematic for people (and computers) to identify the correct name for the correct object. The large variety of names given to people to uniquely set them apart from others is the source of most programming nightmares. On top of that, you also need to take into account that a ton of problems can be thrown in to make it even hard to identify the correct person when comparing there full name to a list that you already have:
- Misspellings
- Abbreviations
- Nick names
- Similar sounds
- Foreign names
- Missing first, middle & last names
- Multiple middle & last names
- Initials only
- Apostrophes
- Reversed first/last names
- Have different variations
- Non-English characters (Unicode)
- Different phonetic spellings
- Titles
- Suffixes
- Prefixes
- Organizations
So why am I suddenly interested in names? I’ve got a little parsing that I’m working on. I have a list of thousands of foreign names. A good deal of time was spent making some algorithms to parse out the first, middle and last names. Due to the lack of perfection, I am retaining the original name on my list along with the other parsed information.
I was able to parse out all of these names, but the real problem is something different. Now I need to support the ability for someone to give me an additional list of names to compare against. I am told that I need to account for just about all of the problems that I had just mentioned and return a list of possible canidates for each name.
Sometimes it just seems impossible. I came into a bit of luck today with the discovery of the soundex phonetic algorithm. The codes are built up against a names consonates and allow you to compare similar sounding names. For example, my last name “Moten” has a soundex code of M350. Several last names share the same code such as Madden, Maiden, Mathena, Matheny, Matney, Medin, Medina, Metheny, Mitton, and Motion. The database software that I am using has built-in support for soundex.
Well, it helps out a bit for phonetics and some spelling, but sometimes it returns too many results that are not likely to be a match. As the database grows in size, I can only imagine that it will get worse. The other problem is that this little feature only works wonders if you have complete names in the list to compare against. If the first name is missing, or you only have an initial, then you are out of luck.
If I had my way … There I go again. I am always thinking that I can force everyone to do as I wish. The best scenario would be to force people to always submit a first, middle, and last name when comparing, and to also get someone to fill in all the missing data in the list. That would waste a lot of time. Can you imagine someone looking at a large list and filling in the missing portions of each name for thousands of people?
So I need to skip the best scenario because it isn’t the best for anyone else except me. For now, I’ll have to work on making many queries against the list to compare exact matches, soundex, initials and names without first or middle names. There has got to be a better way around this.
Tags: Name, Searching, Programming, Troubleshooting, Misspellings, Abbreviations, Nick Names, Foreign, Data, Initials, Variations, Phonetics, Soundex, Algorithms, Parsing, Impossible, M350, Database, Work