Over the last couple days I’ve read a few article linked on Hacker News about the “real name” problem on social web sites. I first heard other people talking about this a few years ago when Twitter was taking off. A lot of ‘famous’ people ended up taking Twitter handles like “TheRealDvorak” because someone already claimed their pen or stage name on Twitter. Sometimes people snap up famous names as a joke. Sometimes it’s just two people sharing the same name. Either way, this practice caused a bit of a stir in the tech community at the time. Recently, Google+ has been in the tech news for deleting accounts with names that don’t appear to “real names.”
When you start looking deep into names, it turns out things are a lot more complicated than you first thought. Patrick McKenzie has a very humorous take on how a typical western software developer thinks about names. The (not very humorous) summary of Patrick’s article goes something like this: Names are not immutable, a person’s name can change over time. Names aren’t even close to unique when you are dealing with a very large population. People don’t always have a “first” and “last” name the way typical Americans think of first and last names. Most names in the world don’t use the Latin character set. Some people don’t even have more than one component of a name…and I’m not talking about celebrities who legally change their names. Take a look at the Wikipedai entry for Javanese names.
I’ve been dealing with issues related to names for a long time now. The first project I worked on at TCG was the iEdison invention reporting web site. This was in the late 1990’s, so this is the early history of the World Wide Web. We were nearly starting from scratch. Yes, there was a paper form and procedure we were modelling, but it wasn’t always the best choice to directly translate the paper process to the web site.
One component of iEdison required the user to list the names of all the researchers who participated on the grant. We would then collect other information about each person: address, education history, role in the project etc. On the paper form you would, of course, have to type in all this information by hand every time. For the electronic form, it made sense to try and link these people up to past form submissions so the end user didn’t have to enter and re-enter everything each time. Most researchers who were on one grant were probably on another as well so there was a good chance that our system already had the data we were asking for.
The perfect system would be able to link a researcher up to their profile as soon as the user entered the name. Right away we saw this was difficult. People sometimes use shortened or alternative forms of their name. I usual go by Al, but my full name is Albert. What if the lead scientist entered Al Crowley on the iEdison web site? It would certainly miss my existing profile stored under Albert Crowley. On the other hand, lots of people have the same name so we could get false matches. We can’t pre-populate a form with the wrong data because it would be a privacy violation, not to mention inaccurate.
We started thinking about what additional information would we need to match a person. For US workers, a social security number would be a good start but that was not information we were gathering and we couldn’t ask for it. Name + business phone number? Name + address? Either of those could be a problem for a father son team since male children are regularly named after their father. Name, address, and date of birth? That would probably eliminate all the false matches, but it would create a lot of misses too. Aliases (as seen above), moving offices, or name changes due to marriage or other reasons would all foil an algorithm that depended on them.
After not much brainstorming, we gave up on the idea of automatically matching profiles. Allowing a user to store an ‘address book’ of profiles is a good 90% solution. That way the user wouldn’t have to re-enter data that they personally keyed in already.
My current project, NITRC, is a scientific community web site. We don’t concern ourselves too much about real names. People self register and we do our due diligence to make sure that the scientific content they provide is good. If their work is up to our standards, we welcome them into our community. We do require a first and last name for each user, but if a person has an unusual name structure we are flexible with their input. In other words, we aren’t going to ban them if they just put a “.” or a single letter in as a first or last name. We also aren’t going to care if they happen to share a name with another well known human.
It’s just my opinion, but I think in the end social media and community sites are going to have to be flexible when it comes to names. The online world is inherently chaotic, but so is the real world. People go by multiple names or nick names all the time. When you meet someone at a conference, or a classroom, or a party you don’t ask for government issued ID. They will be introduced with a name or nickname. Most likely, everyone in that social circle will know them by that name even if it isn’t their legal name. This is a system that has worked for thousands of years. It seems hard to believe that Facebook , Google+, or Twitter will be able to come up with a something that works any better than that.
I’ve noticed this problem too. When I signed up to Twitter and tried to find certain well-known people to follow it was sometimes hard to work out whether a particular account was really theirs or not.