Forums: Index > Help desk > Standardization of records

Opening questions[]

So, I have been pondering... A Genealogy Wiki is so cool! :D But how do we link everyone together so that once you connect to a family, you aren't entering new pages for all the folks that are already here from another persons' family? For example: Let's say I have a Joe Smith, he was born in 1711 and died in 1760. I make a record for Joe and put in all the details. Later another researcher puts their tree here online, they, too have Joe Smith, so they make a record called "Joe Smith (1711-1760)", since that isn't exactly like my Joe Smith, a new record is created. Worse, their Joe Smith had three sons, my family is related through Kate Smith, Joe's daughter, but theirs is related through Joe Smith Jr, the son. So their record is Joe Smith I. How can we kind of "funnel" people into editing my Joe Smith instead of making a new Joe Smith? I've been thinking about how to make records work well (I have Stewarts in my family, a lot of them... and they love to name themselves "James"... so I have, literally, 20 James Stewarts... I'm going to have to do something to differentiate them). Is there a standard we can sort of encourage researchers to use so that all of the records mesh nicely? Personally, I've been using GivName FamName(bdate-ddate) as my naming convention, that pretty much gives a unique record for each person, and allows for a ton of "James Stewart"s (And it looked like other folks were doing that too, so I'm kinda following suit). Is this even a concern? Again, I'm really new to the Genealogy Wiki... so...erm... Thoughts? :D Aabh 21:37, 30 May 2007 (UTC)

Response from Bill (with extras from others soon after)[]

A couple of things to keep in mind---in particular, this wiki has about 6000 articles, and 13000 pages. Compared to Wikipedia, this site is tiny. On the other hand, the site is growing. The significance of this is that so far running into duplications hasn't been much of a problem. There simply isn't that much overlap. If you create an article about your great grandmother, its not likely that you will find it duplicated by someone else. On the otherhand, its only a matter of time before we start getting significant cross connections. In point of fact, there are at least two contributors on this site who have found that they DO share some common ancestors---albeit by marriage. Eventually they will end up putting in those ancestors, and at that point the porblem you describe will arise.

How do you avoid that? Fairly easy in theory: you check the site to see if there's an existing article. To check, use the search function in the navigation pane. There's also a more complex search function that can be used, though the pointers to it are a bit obscure. First, check the Main Page. There's a link there to Main Page/Getting Started on the Wiki. One of the topics under that subarticle is using the search function.

Another mechanism that can catch duplications is based on the article name structure:
[First Name] [Middle Name] [Last Name] (YOB=YOD)
Theoretically, if you get the name right, any instance where there's an article with the same name will get picked up. Definitely not foolproof. While the inclusion of YOB-YOD in the name does not guarantee each person article will have a unique name (There could be two John Smiths born 1806 and died 1896---long odds, but possible), it's fairly unlikely that we would have any duplication simply because we have two different people with the same name and vita.

You might get a case where you miss a previous article about someone because of slight variatons in the article title. For example you might want to write an article about John Quincy Smith, search for "John Quincy Smith (1806-1896)", and miss an existing article simply because a) the previous article has a different DOB (John Quincy Smith (1807-1896)), added an extra inadvertent space (John Quincy Smith 1806-_1896), or used a middle initial (John Q. Smith (1806-1896)). You might spot such similar articles in a search, but the system wouldn't automatically warn you that there an article about "John Quincy Smith (1806-1896)" already existed.

It would take only a modestly intelligent algorithm to spot articles about the same person but under slightly different names. It's not likely to happen anytime soon, though I'm sure that eventually something like that will exist. (the basic approach for that would be a routine looked at simlarities in the name "John Smith" vs" John Q Smith" vs John Q. Smith" etc, and then compared significant facts in the articles (DOB's etc). Of course, it would have to recognize what was the DOB---realizing for example that 10/09/1806 is the same as "October 09, 1806", but different from "September 10, 1806".)

I should also add that yes, there is a naming convention used on this site. See Genealogy:Page names Bill 00:30, 31 May 2007 (UTC)

Cool! Thanks! I have another question; it has taken me two days to get a whopping 6 pages made here... I'm getting faster as I work, but I've got literally thousands of pages to upload here to get my tree online... I might look up each and every individual to see if they already exist... but I'm not sure everyone would... It might be a little too much to do if you are trying to get an entire tree on here... which I think is what we want folks to do... I guess the answer is that we just need to police our work pretty heavily to make sure it's tight... Aabh 14:01, 1 June 2007 (UTC)
It's conceivable that you may be able to copy Brian Yap's method - see Help talk:Loading Gedcoms. Robin Patterson 14:12, 1 June 2007 (UTC)
Eventually, when I have the time, and have invested in the needed software upgrades, I will start working on a broader solution to this and related problems. Its a long standingitem on my "to do". I've been leary of the Yap approach, as I can't figure out how it works. I don't know, for example, if it will overwrite files that already exist under the same name. I've also found the explanation of how to use it a bit obscure. Probably haven't invested enough time in it. If you try it, let me know how it worked for you.
Yes, data entry on this site can be time consuming, particularly if you want a very specific layout. I streamlined it for myself by creating a specific template/input box that I access from my user's page---it lays things out the way I want them, at least for initial purposes, and simpifies data input considerably. There's something similar in the "Create a page" of the navigation bar, but having to scroll down through all of the descriptive stuff is something of a drag. (I'd rather see the input boxes up at the top where they would be most directly usable. Then if you needed the caveats and explanations you could scroll down. Robin likes it the otherway around because he's concerned that people won't read the caveats. Bill 15:50, 1 June 2007 (UTC)
I can understand that, I think I side a little with Robin on that one (Though I see the validity of both sides). I have templates already laid out on my pages, so that's covered. Really (And there is absolutely nothing you guys can do about this one :D) the most time consuming thing is that when I enter a person into the Wiki, I take the opportunity to go hunting for more information to assemble onto that person, when I get out of the blood of Scottish kings, I'm sure things will go a lot faster (I just hit a person that there is only one badly mangled record on a single persons' tree, effectively no data, it only took a moment to enter). Aabh 23:24, 1 June 2007 (UTC)
Well, of course! You've discovered the really great thing about the wiki! you can use this system to organize your information and use it as your personal gigantic database in the sky---and NOT be slaved to what some anonymous programmer decides about the way you need to organize the information. Not only that, but here I don't even have to be consistent---I can use one structure for certain people, and a totally different one for others--just depending on what's useful at the moment. But the key thing is that i can gather information and store it here, in any kind of organized format I want. Then come back at a later date and build the article. Of course, as you point out, this does take time. But that's the nature of genealogy. Bill 23:40, 1 June 2007 (UTC)

Back to the point: I have yet another question (You guys will get so tired of me), this one is a Wiki programming question: If I don't have vital information, like the birth and death dates, can I enter the data as Joe Smith (1810s-1880s) to avoid the Joe Smiths that are in the other centuries? Aabh 23:24, 1 June 2007 (UTC)
The preferred convention is (c1805-c1885) Bill 15:50, 1 June 2007 (UTC)

Cool... I'll start using that immediately... But... what if you have no birth or death date at all? I don't know anything about "Tim Smith", except he was Bob Smith's father and George Smith's son... On my tree it just has a name. When I put the circa statement, should I just guess an average age of 50 years? (c.1810-c1860), guessing that he'd be born about halfway between Bob and George?

We won't get tired of you. Those insightful questions are highly educational, because I can see that if a guy as bright as you can ask that sort of question our guidance to newbies needs improving. I'm about to create a new Forum for you to help us with: Forum:Help improve the help pages. Robin Patterson 12:40, 2 June 2007 (UTC)

Cool! I'll be there! And... thank you, though I must admit I don't feel too terribly bright when I'm asking stupidly obvious questions and answering them myself! :D

That's sort of how I started it... and, lets say we fix Joe there, and another researcher finds Joe's birthdate... can we fix Joe so that the dates are in? Aabh 23:24, 1 June 2007 (UTC)
You fix problems in the names of articles by "moving" the article to a different page with the corrected name. The wiki keeps track of articles that have been moved, so if you have links to a particular article under the old name, the wiki knows where to take you to get to the current version of the article.
There's a move function on your navigation bar. I'm told that moving an article can be tricky because it can have unexpected consequences, particularly with pages that many other articles are linked to. I've not noticed it being a problem, except when you end up having to move an article numerous times. Bill 15:50, 1 June 2007 (UTC)
Yes, when you move a page twice you need to do a manual bypass of the name that's just been changed because the system refuses to handle a double redirect (but it offers guidance on which pages need editing). Robin Patterson 12:40, 2 June 2007 (UTC)

If we reroute Joe Smith (1810s-1880s) to Joe Smith (1812-1865), and another Joe Smith (Not related) shows up with no birth date and no death date, but from the same general area, there's really nothing we can do, is there? Aabh 23:24, 1 June 2007 (UTC)

There are techniques to preculde that. One is to use the "circa" approach in the title as suggested above. The other is to add "by names". For example, there are many "John Walkers", but only one called "Indian Killer", and only one called "Meadow Creek John". And you can't (((Robin thinks Bill meant "can"))) invent whatever byname might suit your purposes---no need to worry about having a by name by which the person was actually known in life. "Indian Killer", for example, really was known as Indian Killer, for good and sufficient reasons, but "Meadow Creek John" is just a by-name we've hung on the man because we needed to distinguish him from all of the other John Walkers. (As it happens, he is often confused with Indian Killer.) Bill 00:34, 2 June 2007 (UTC)
We'll have to make a record called "Joe Smith (1810s-1880s)-II", even though we have fixed the first Joe, we can't change his physical records name. Am I correct in this? Aabh 23:24, 1 June 2007 (UTC)

Nope, see above. Bill 00:34, 2 June 2007 (UTC)
If we did, all the links to it would get lost, am I correct? Aabh 23:24, 1 June 2007 (UTC)

Nope, See above. The Wikia uses a VERY well worked out system, and accomodates these kinds of things automatically---within reason. Bill 00:34, 2 June 2007 (UTC)
I ask this because if so there is a lot more weight in making an original record... you can't just make "Joe Smith (sometime)" and then go back and fix it later. Aabh 23:24, 1 June 2007 (UTC)
Hmmm... This all makes sense, but now I'm wondering about something else: What if I am related to John Walker, but I don't have any record of him being called "Indian Killer"... how do I link my tree to yours? Of course, now that I've said that the answer is kinda obvious, I do the footwork until the names match up (And see if John's father, mother, grandfather, etc match). Okay, answered my own question! Thanks! :D (I left the question on here in case it helps someone else! :D)
I'm probably misunderstanding your question, but either your concerned about
a) linking to someone using a by name that you aren't crazy about (i John Walker IV (c1740-c1817) aka "Indian Killer"e,, or
The answer here is that we do something with the move function so that when you type John Walker IV (c1740-c1817) with out the "aka "Indian Killer" in the name, you get to the same article.

Actually, I'll probably do something with that title anyway, but for your purposes that's neither here not there.

b) how would you come up with the a hit on something as idiosyncratic as "Indian Killer".
Trust me. If you knew you were linked to John Walker IV (c1740-c1817), "Indian Killer" would be perfectly obvious. You might not agree that Indian Killer was John IV, but you'd know the argument. But that doesn't resolve the generic question---and I think you answered that yourself: its a connection that you'd have to make through your research. If you didn't recognize the connection, the only problem would be that you'd create an article for someone who already had an article. And I believe that eventually we are going to have a lot of duplicate articles like that, because not everyone is any agreement as to the identity of XYZ.

Other responses[]

Bill has said most of it. Another way duplicates may be found is in the categories. Surname categories, birth and death year categories, for example. See User_talk:WMWillis#Category:Created_Using_Research_Template for an example of where it has already happened. In your example at the top, only the second researcher followed the recommended format (as you do in reality). If everyone follows the standard first time, there will be insignificant duplication. When a third person tries to create a page Joe Smith (1711-1760), the program will display the existing page of that name. If they are actually different people, whoever works that out is welcome to create a disambiguation page listing the more distinctive pages each has been given - Robin Patterson 12:03, 31 May 2007 (UTC)

Yeah, [[Category:Smith Surname]], if all Smiths on this site are listed there, it would be very helpful to see if there was any duplicates. -AMK152(TalkContributions 14:44, 25 August 2007 (UTC)
Eventually my plans include a mechanism for ensuring that every person article has, at a minimum, a Surname Category. That can be picked out of the article titles robotically, and so if the robot checks to see if there is a catgory for the surname, and there isni't one, it can insert one. Bill 18:48, 25 August 2007 (UTC)
That's a job for User:PhloxBot now that "he" has joined the team. Robin Patterson 12:22, 2 October 2007 (UTC)
PhloxBot has informed me he has every confidence he will perform his mission satisfactorily. I think I heard that before in 2001 regarding a certain HAL computer. Or was it Iraq. Anyway. Me, I am skeptical he will handle two part last names perfectly. I'll do the common ones like de XXXXX, but nothing fancy.
There may also be glitches with over specified person titles. The pathological case of the indian killer is not so hard. You take the string prior to the paren'ed material, suffix trim any titles (esq, jr.), suffix trim roman numerals. It will still goof, eg. william the conquerer will assume conquerer as last name.
Every article in the main namespace with a (xxx) something in parens at the end will get a surname cat if it doesn't already have one. I will tag them with a pseudo category- Bot generated surnames, listing all the new names so that folks can quickly spot any errors, or undo the lot of them if I (rather PhloxBot) bollocks it all up. A subsequent run will then remove references to this pseudo category.

If everyone is agreed, I set him to work right away. ~ Phlox 23:47, 3 October 2007 (UTC)
One qualified vote in favour from me. Seems fine taking the word/s preceding the "(" as long as the parens contain at least one group of three or four numerals - we don't want Cemeteries in Georgia (U.S. state) getting a surname category. Robin Patterson 13:26, 5 October 2007 (UTC)
Yeah. If it has no ? or number in parens, the bot will skip it. I'll do a few dry runs. There will be a delay on project start though. Maybe a week or two.~ Phlox 17:33, 5 October 2007 (UTC)