Forum:Standardising page names for individuals

(''NB: Much of this could be archived. Towards the end of 2008 we got close to agreement, so nobody has to read anything from 2007 although newcomers may find it helpful as background.')

Introduction
This discussion started a couple of years ago. Now that we have over 20,000 articles and are gaining several new active contributors every month, I think we should finalise it and save time and probably attract and keep more contributors.

First I will collect relevant extracts from other pages. Robin Patterson 13:58, 3 July 2008 (UTC)

Genealogy talk:Page names Oct-Dec 2007
(Extracts of parts that were not later fully superseded; changing heading levels to fit here, and editing very slightly.)

Finding us on Google
Your response has included the somewhat confirmatory "a lot depends on how we name things" but has not offered your solution, '''your "how we [should] name things". Please spell it out.''' ... Robin Patterson 14:53, 11 October 2007 (UTC)


 * ..... Shall we be content to wait until that far future when folks learn how to use the advanced search interface features like allintitle?  Are we going to expect that folks that are curious about genealogy and search for their ancestor that they already know their birth and death dates?  You are kidding.  ...
 * ... The standards are requiring data to be stored in the title- data like birthdate that could be wrong. EG: we have a redirect from an article that says William was born in a certain year which was wrong.  Ok.  Sure the redirect mechanism worked, but all the articles using that link says he was born in the wrong year.  Editorially, a person is certainly required to go back and change them, but take a bet on how often folks will do that.  Sure double redirects can be fixed, but how about when we have a hundred users online simultaneously, and they are all changing around stuff to conform to their evolving understanding of their ancestors.  It will be a mess. ...
 * ..... If you use search features that 90% of folks have never heard about, or rely on the expectation that people already know a lot about their ancestors, then you have narrowed the footprint of your potential market radically. ...
 * What are the possible steps? They are disagreeable for different respects because they don't look genealogical.  Don't get me wrong.  I like the full information a long name provides at a glance.
 * Never place anything between first name and last name. Absolute ban on that.  Middle names don't have to be banned, just not between the first and last.
 * Use WP KISS standards for naming. Keep it simple.  If Fred I of England is good enough for WP, ours shouldn't be Fred I, Earl of Smith's City, 6th Viscount of Wherehouse, ....  If the root name doesn't match WP's, then a good case should be made why not.
 * There is a need to disambiguate names, so I can see putting at least one number in there. Much better if it were a number like Familysearch's approach using AFNs.  But I can go along with a single date.  So on that score, we could include
 * birth date only
 * Last 4 digits of guid.
 * It's probably not a big deal if it is either of these or some other single number or equivalent. I can probably go along with any alternative.  Currently I use birth only.
 * ~  Ph l o x  17:25, 11 October 2007 (UTC)


 * (Copied from User talk page to here:)
 * What about for people whose death dates are known and their birth dates are unknown? - AMK152 (Talk • Contributions 21:29, 10 October 2007 (UTC)
 * It's not a special case. Birth is always (befDEATHDATE), so you can specify in this instance as well.  When both are unknown, Foo Bar (?-?) disambiguates no better than Foo Bar (?).  In any case, at best your point makes guid more attractive, ....   ~  Ph l o x   17:36, 14 October 2007 (UTC)

Phlox proposal

 * 1) Keep statement that these are guidelines, not blanket rules.
 * 2) Recommend that users NOT insert middle name, initial (or anything else) between first and last name. Putting middle name elsewhere is ok, if desired.
 * Lower priority recommendation:
 * 1) Omit death date unless birth and death date are certainly known. Otherwise, only use birth date.
 * ~  Ph l o x  17:27, 20 October 2007 (UTC)

Extensions and consequences
I presume we would lift Google rank even further by omitting birth date too (thus in most cases matching the WP name if any). That would multiply by a few hundred the likely number of disambiguation pages necessary; not a problem, a difference of degree not of kind, because the omission of the death date already multiplies the likely number by several dozen. How about it? Robin Patterson 13:03, 1 December 2007 (UTC)

If the disambiguation pages contain merely links to the pages for individuals, will that affect Google ranking? If so, I expect we could make a practice of adding detail ... to each link. ... Robin Patterson 13:03, 1 December 2007 (UTC)


 * I agree with omitting the dates, like the Wikipedia name. However, some people are known for their middle name (like John F. Kennedy or Warren G. Harding) and a lot of my Dutch and German ancestors are known to researchers by their full name. Like Gerrit Hendrik te Kolste (1794-?). Calling him Gerrit te Kolste would be confusing. Anyway, I agree with omitting the dates to get something like William I, King of England instead of William I, King of England (1027-1087) .... Of course, doing this, we would eventually develop many disambiguation pages. We are at a point in the Wikia where there are only a handful of contributors. As the number of contributors grows, so will the number of people who understand the use of disambiguation. So, people will be able to find us quicker and those who stay will gain knowledge in the use of disambiguation, and use them if necessary. - AMK152 (Talk • Contributions 13:16, 1 December 2007 (UTC) .....


 * Thank you for taking time to set out a possible process. But I think we can make it shorter than that.
 * Create disambiguation page designed as the target of simple googling: John Smith. It has Template:Disambig and it lists all the John Smiths in the wiki by whatever complex page names they currently have, eg John Isaac Smith III (1902-1985).
 * Every John Smith (including every John Isaac Smith etc) in the wiki can be given a full page name (in the style recommended) as soon as he is created, with a link to him on John Smith. No need for quite as many pagename changes as listed above: ideally none.
 * So a person finds our John Smith page very near the top of the hit list on Google, possibly with a context extract showing that there are several people on our page, so they come to us and can read the list then go off to find one that seems to match their one. Not unlike a RootsWeb search, ....
 * Robin Patterson 02:27, 2 December 2007 (UTC)

I think I mentioned disambiguation pages were one reason I was producing for each page an info subpage. When I get all pages with info pages we can do the DPL disambiguation pages: EG: Note that I made the second table on these pages sortable. EG: Click on the little arrow button in the Birth Place column, and all the John Smiths would be clustered into the location the person is interested in. Clearly for this reason, I would want to display these locations in reverse Country-State-County-city order so they would cluster in the most useful way for users. ~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  04:24, 2 December 2007 (UTC)
 * Barack Obama (disambiguation)
 * George Bush (disambiguation)


 * Surely a page called Barack Obama (disambiguation) will rank lower in a Google search than Barack Obama? Why complicate things? Can't we use the simple system proposed above, ... which matches Wikipedia's prime method of disambiguation? Or are you saying we will have both? I know that this wiki has several pages with names ending in "(disambiguation)", but most of them are that way because their creator had not properly studied the way WP makes that sort of thing as simple as possible with only a few defined exceptions. Robin Patterson 05:35, 2 December 2007 (UTC)


 * Fine with me. I like simple ....   What matters is how many inbound links there are to the article.  I think I posted a pointer to a good discussion of google's algorithm, but it takes into consideration a lot of other things.  My earlier point with google only had to do with not inserting extraneous terms between first and last name, so that our pages would be eligible for the most common kinds of searches.   <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   10:23, 2 December 2007 (UTC)

Google rank of pages - adding anything to plain name?

 * (early April 2009 import of significant discussion from Phlox's talk page)

"Page name" is one of the things we should settle before wholesale importation from GEDCOMs, I agree. I read your comment about it not mattering much whether a page name contains " (disambiguation)". Doesn't the addition of anything to the plain name reduce the "percentage" part of what determines ranking? Wasn't that part of your reason for the firm decision to depart from our standard of including death dates (which decision has not been adopted by other major contributors)? Robin Patterson 00:50, 6 December 2007 (UTC)
 * Google was only part of it. The part having to do with Google is the bit about including middle name between the first and last name.  That is really dumb from a ranking perspective, because the weight of a term is determined by how closely it is adjacent to the other search terms.  Secondly, it is important for phrase search because "Elvis Presley" will get zero hits on our site because article Elvis Aaron Presley (1935-1977) makes no mention of an "Elvis Presley".  Even if it did, our rankings would be submerged because our site felt the term was not germaine enough to the subject to include in the Title.  Really, really self defeating...
 * Anyway- you have a point that the more terms you shove in a title, the more you water it down. I was stating that adding "disambiguation" to the end is not fatal like adding "Aaron" to the middle is.  So there is a google factor there, sure.


 * The main problem I had with putting data like dates in there is that it encourages churn in the database. Everytime someone comes in and has a different date in their genealogy files, or think that the date is not certain (should be c1856, not 1856)- they want to move the article.  But our articles are aggregates- All will have at least the article page as well as an info page.  Many will have separate ancestors, pictures and tree pages as KBorland is doing.  Will the contributor move all pages properly?  Maybe sometimes, but probably it will get screwed up frequently.  So it presents a collosal maintenance pain for what- so that we can follow a conventions designed for paper filing methods? Discovering the death date is just a click away.  And the cost of saving people the time to click?  Massive maintenance burden.  But let's get realistic about that.  Given millions of articles, it means this work simply will not get done- which means what?  That's right- Our site starts to turn to junk, with numerous broken articles.


 * Certainly, that problem doesn't happen at the 10,000 article number, but hey- Are we planning for a 1st tier genealogy or not? Sure we could fail for any number of reasons- but why plan for failure?  Plan like we are really going to pull this off.  We are not going to have one million, or just two million articles.  The gedcoms on file already exceed those numbers.  And what happens when this goes global and snowballs?  Both of us may be old guys, but within our lifetimes we are easily going to see hundreds of millions of articles on this site Robin.  That is an enormous enormous maintenance burden for those that come after us and we have to be hard nosed about what we do to preempt massive maintenance burdens that don't deliver significant pay offs for users.


 * I do not make the proposal lightly but we have to look at the rationale for the conventions and challenge whether they are relevant to our problem domain. Should we use all caps in the Surname?  Well a decade ago, suggesting anything different was rebellious/naive idea.


 * I don't see why we should be a slave to convention, especially if it delivers insignificant benefits and will have such serious costs in the future.
 * <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  02:20, 6 December 2007 (UTC)


 * OK, death years are nice (and very little detriment googlewise) if nobody's going to change them but we can't guarantee that, so chuck them (and allow 50 times as many disambig pages because that's a minor irritation). Same with birth years then? They are even more susceptible to change, and a name with just one date is unconventionally odd and therefore likely to puzzle newcomers and even turn them away. Shall we then agree that an individual's page be the plain name (same as on his or her disambig Google-target page) plus a fixed distinguisher such as a thingy-number (which you mentioned a while back but I haven't time to look up) - I could be happy with that. Robin Patterson 13:23, 6 December 2007 (UTC)
 * Wow. Birth years too?  I knew you were closet radical.  Well, I suppose it is not totally radical.  After all, LDS does this in their ancestry files.  In ancestry files, they use NAME (AFN).


 * The thingee- the generic term for what an AFN or a genealogics Person ID is "UID" (unique identifier). Some genealogy sites use the term UID and export them as part of their gedcom data.  Naive users are going to create ugly links though- eg David Henderson (729382).  I suppose I could make an info template that looks up the data on the fly.  EG:

looks up birth and death and displays that as if the person had gone to the bother of typing the wikitext David Henderson (c1734-1810) So what are the Cons?
 * Will disambig pages really only be a minor irritation? I have seen a surprising number of multiples even for uncommon names.
 * There will be thousands of them but very easy to handle. Robin Patterson 03:57, 9 December 2007 (UTC)


 * Audience reaction? Will folks have a Frankenstein reaction when they see David Henderson (729382) as the title of the article on their beloved ancestor?  Will it really be William I of England (390351)?
 * Learning curve/ barrier to contributing? Ok- We make it simple to generate UIDs-let them make them up- Any 6 digit number.  Otherwise, if we have something more complicated like a computer generated UID as I originally proposed, then we have to wait until I figure out how to make a widget do a form that will input the data and create the UID on an info page for them.
 * The way DLP works, I am not sure that I will be able to generate a friendly name. I think it may have to have the UID because it wants to use the real title of the article.  Of course, we can put a column in there for birth and death years, so not that big a deal if that turns out to be unavoidable.
 * Will long numbers in a title downgrade a google hit? Possible rationale- the page is more likely to be a technical or database like dump page. They might do this- it is said they examine 60 or more factors.  No way to know for sure.  I suppose we could make it 3- there might be collisions, but they will know when they create a page.  If we make it 4, people might assume it is a date.  When a 3 range starts to get exhausted, we could tell people to use 6 digit numbers.
 * Sing along with me: "Secret Agent man... We're giving you a number, and taking away your name".  Numeric identifiers make ancestors more impersonal.  Date of birth is less impersonal, but that is not without problems- eg John Smith (1956)  Worth it?  I don't know.  But it's the impersonal factor- how big a Con it is, that's a subjective judgement call.
 * Any more cons?  <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   17:22, 6 December 2007 (UTC)
 * Since the naming disambiguities are only going to occur, by definition, in the same surname, why not have people who own or regularly contribute to each surname decide on a case-by-case basis how to name their individuals? For example, the Dzyban family (a small family of an ethnic minority in the hills of Poland) will probably have different needs in page naming than the Smith families of the U.S.  Perhaps instead of numbers in the titles to disambiguate, there are other options as well, i.e. John Jones (1800-?) of Pittsburgh, son of Bill & Mary.  Perhaps the creativity and flexibility allowed in the Wiki format is is best feature, after all.  I think uniformity in page naming is far less important than uniformity in categorization, since the categories we're creating are going to become the real tools that set this site apart from others as far as search capabilities.  For example, being able to generate automatic lists such as New York City births in 1608, etc. Kborland 01:38, 7 December 2007 (UTC)
 * Certainly, these will be guidelines not policy and folks may elect to depart from them. Robin is aware that I shall be importing a very large number of articles into Familypedia in the foreseeable future.  For now, that is the scope of articles that will be affected.  But due to the number of articles involved, it will become the de facto standard.  So although it could be altered later via bot, it would be better to have the discussion now rather than after the import.


 * Anyway- to your comments- you are right that predictability and uniformity in cats is important. But your particular example is mistaken.  A hard coded category scheme as you seem to suggest won't scale.  Fortunately, we don't need to hardcode categories like "New York City births in 1608".  The database functionality you envision will use real database queries using DLP.  The database features depend on stability in names of objects, and that is a vulnerability we currently have.  This along will google searching is the driving motivation of the naming convention discussion.  Do you see how Template:Info categories can extract information from an info page and generate a category?  Well- when there are enough "born in New York city" folks on the wiki, that template will generate a marker along with a bunch of other markers for fields you can query like mother's maiden name, death city etc etc.  Buckets of them to query on.  If you attempted to hardcode them as categories, you run into a combinatorial explosion.  But you don't have to because DLP can generate a list with a query like: "get me articles with death city NYC, surname Smith, death decade 1890".  I've done demo examples for Catherine Price, Bush and Obama.   It can do this because all the information is in an easily locatable form so that it can be extracted easily.  But what happens if the person renames the article and forgets to move the info page?  Notice how many times you have to rename due to changing information about the subject?  Why are people doing that?  Because they are putting data that is subject to change in the name.  This makes the wiki by definition an unstable database.  Stability in naming means that info pages don't get separated from their articles.


 * Whether or not you agree this sort of solution is necessary, you are free to opt out. The info templates don't assume that there is some special number in the name that it needs in order to function.  It will make no such assumptions, so folks can do "Bob Jones (c1863-??) of Pittsburgh, son of Bill & Mary", and change it to "Bob Jones (c1863-1922) of Pittsburgh, son of Bill & Mary" when they learn the death date, and rename it to "Bob Jones (c1863-1922) of Pittsburgh, son of Bill & Catherine", when they learn Bob was actually the son from an earlier marriage etc etc.  No problem.  Just so long as you move the Info page along with it, you will get all the benefits of dynamic query and info pages.  Bit of a hassle though- and most folks aren't going to do it.


 * And really- you don't have to bother with the info page either if you don't want. No one is going to force anyone to do anything they don't want.  All I'm saying is that the info page features aren't going to work without them.  You may not care about them with only 10,000 person articles.  I predict you will care with millions.
 * <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  02:43, 7 December 2007 (UTC)
 * I have been doing reality checks in my head, you know- creating mental mockups of what the site would look like with these funny numbers at the end, and it seems to me they may be too disorienting, because there is no context of time. It forces the display name be friendly- so the real name with the funny number never shows up in the article.  That is yet another bit of learning curve to add to our contributors-  Like we have to tell them- forget what you know about article name linking.  Instead always use , otherwise you won't get the friendly name displayed.  But even for experienced users, it would be disorienting because look at what it is like in cases when you are trying to sort which person of a similar name is the correct father.  You are discussing these two Joe Unknowns, and the article display says Joe Unknown (1734-1776) but says only   versus   on the edit page.  It's a PITA.


 * So I am not sure my proposal won't create more problems than it solves.  Maybe we just have to keep enough Bot operators trained to make periodic passes after people inevitably leave messes after renaming articles.   <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   23:54, 8 December 2007 (UTC)


 * Instead of a meaningless-looking number, how about a relatively change-proof decade? George Bush (1930s) would be a fairly easy thing to teach people, obviously meaning something to them and avoiding "c" and "bef" and "aft" (mostly) and small year corrections almost entirely. Easy for bot to put them in their "Born in the ....s" category. Until we get another George Bush born in that decade we have two distinct pages linked from our Googletarget George Bush with no doubt in most searchers' minds about which one they want (and no need to disguise the true pagename with a ); when we do get another, we  (ie any ordinary user) can create more distinctive pages with less need for rules to specify the precise form. Robin Patterson 03:57, 9 December 2007 (UTC)
 * Yeah. Small point though- the bot won't treat anything in the article name as data for reasons I was intimating to Kevin.  It keeps it clean.  But your proposal deals with most of the problems.  It is a good 80% solution for name moves, since 20% will be on the decade cusp (1849 vs. 1850 birth), with the remainder (some of my "befores" could be as much as 20 years before my guess- Some are based on socio-biological minimums for likely motherhood age.  Anyway, they still are going to call us weird, but I think we are getting closer Robin.    Thanks- it was a good proposal.   I will do some more mental modeling and see if it holds up.


 * By the way, I was a dunderhead for not thinking about this, but I am going to have to crank down the knob on my activities here for the holiday season. I've got 4 kids 2-6 years old and I shouldn't be undertaking anything major like info page switchover until after the 1st.  So I think I will be mostly puttering around here and will hold off on any major bot runs.  So happy holidays and pass the eggnog, heavy on the nutmeg.
 * <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  04:20, 9 December 2007 (UTC)
 * Robin- Two things on your "1930s" proposal:
 * If there are two George Bushs in the 1930s, we have a disambiguation page. What article names does the disambiguation page point to?  Are you proposing some variation of the funny number then? eg George Bush (395) (1930s)?
 * Bill made a good point in the context of a gedcom discussion about many people's genealogy research being based on tertiary sources- simply copying a relationship because some Gedcom file said their was a parent child link. So for bulk GEDCOM import, having a vague period in a title like (1860s) is completely appropriate indication to people of how they ought to regard articles that contain no primary source evidence.


 * If that convention gained traction, then perhaps it would be built on. For example, renames to something specific might be done if the confidence in the information was elevated.  For example, if it turned out that the imported material is verified by a contributor as having sufficiently reliable source material that supports the the specifics contained in the article.    <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   19:30, 10 December 2007 (UTC)

Fairly recent comments by a helpful newcomer
(Copied from material written by Vick jay 17:00, 2 July 2008 (UTC) on Forum talk:Organization. Slightly off topic but relevant to the problems that may appear with the proposed "final" page names (the targets of the disambiguation pages) and also relevant to the incentives to persuade most or all contributors to follow the guidelines.)

In the last month, I have "categorized" many, many people. There are thousands yet to do. It's ridiculous.

That tells me others are uploading their family information without a clue as to what to do next. Instructions need to be more clear on how to "categorize" and connect family members AND why you must do it. ...

The current method is not sensible simply because, as it is, you, AMT and others like me must slog through innumerable entries and "fix" them. What a waste of time!

As time goes by and you have more people adding their data, how on earth are you going to handle the same family members with slightly different information?

ie., Davis Stockton b. c1680, Ireland d. 1761 Virginia and Davis Stockton b. about 1688 County Meath, Ireland, d. 2 Jan 1762 Charlotteville, Albemarle, VA.

As a Stockton researcher, I know they are the same person, but would you? Is someone here going to "edit" the data and merge the 2 Davis Stockton? What criteria will be used? OR will you leave 2 entrys for the same person?


 * (Response from AMK:) Those researching the person will discuss it. If they're the same exact person, they should have the same exact data. If unknown data is estimated, the sources can be reviewed by both parties, in order to determine the most precise data. :- AMK152 (talk • contribs) 03:36, 3 July 2008 (UTC)

(I think I've got the lot there! Now for a bit of reference to subsequent page-creations.) Robin Patterson 13:58, 3 July 2008 (UTC)

Simple Google-friendly disambiguation pages
After some of the above discussions, the Obama page was redirected (as recommended) to the simpler form Barack Obama. It looks good to me (and the amount of detail completely wipes my concern that a disambiguation page might not rank highly on Google).

It is one of a dozen pages currently listed in Category:Similar person names. Eventually there could be millions; have a look at it while it's still small, in case there's a fatal flaw that needs fixing! The pages that end with "(disambiguation)" can probably go the same way as Obama, because the couple I looked at are not following the Wikipedia simple method. I'm happy to discuss that with anyone affected who's puzzled. Maybe read Wikipedia:Wikipedia:Disambiguation first.

The corresponding template is closely based on a Wikipedia one with a sort of acronym for its name; we can copy the short one as a redirect to save typing time.

Robin Patterson 16:21, 3 July 2008 (UTC)


 * I agree with the above. I think someone coming to our site looking for information on their ancestors should be able to type in a first and last name in the search box and find either an article (if that is the only person with that first/last name combination) or a disambiguation page as illustrated above listing all the individuals with that first and last name. I think (although I don't fully understand the discussion on Google) that this would work for Google too.Bill H 15:41, 12 April 2009 (UTC)

How much care do individuals' page names need?
Based on comments by Phlox and Vick jay, among others, I recommend that we keep newly created pages for individuals fairly simple and include:
 * no death years unless absolutely certain
 * birth years only when we are certain about them or when we know there's another person whose page name would otherwise be the same — and even then there may be a better distinction method, like those that Wikipedia uses (and I think that Phlox and AMK and I agree that the use of the Wikipedia name exactly is very often the best idea, for more than one reason)

Many new entries for past centuries will therefore be Google-friendly pages in their own right, with just the first and last name. If another individual of that name appears, the person wanting to create it should:
 * 1) add some disambiguating feature (such as approximate birth year, or a Wikipedia-style word/phrase, in parentheses)
 * 2) "Move" the original to a name that disambiguates it sufficiently too
 * 3) edit the original page (which will have become a redirect) so that it has Template:Similar person names and links to both.

"Fixing" of existing pages is very low in my priorities. I know one of our biggest contributors has been doing sterling work with things like removing the space between "c" and a date; but I don't think it's really important at that level of detail, because we could easily get another contributor (either by carelessly not searching for near-duplicates, or by using a GEDCOM program that can't search for near-duplicates) putting up a page for that individual with a different (possibly dead-accurate) birth year; the presence or absence of a space in the existing one will be quite irrelevant then. Such near-duplicates will be found, in most cases, if at all, fairly easily on a search hit-list or just a look through a category. (Another potential use of birth decade categories or even century categories, Vick jay!)

Robin Patterson 16:21, 3 July 2008 (UTC)

More?
(Long past my bedtime. See you all!)

Robin Patterson 16:21, 3 July 2008 (UTC) (4.21 am in NZ)

AMK152's comments
I went to google and did some searching:

I seached "Mary Buskirk," my great great grandmother:

Mary Susannah Buskirk (1880-aft1930) appears number 4 on the list. The 3 results are Mary Buskirk's, not her.

I searched a more common name, "Thomas Putnam," looking for my ancestor. His son, Thomas Putnam was notable enough to have his own Wikipedia article.

Yet, my ancestor, Thomas Putnam (1615-1686) appears at number 7.

Perhaps a very notable person? I searched "George Washington" and he doesn't appear on the first page. This is simply because he is a very notable person, and appears in many locations on the internet. He appears #7 on the list when searching "Genealogy of George Washington."

So we are getting good google rank.

The disambiguation pages without the "(disambiguation)" part I agree with, and have moved some pages accordingly.

This is what I prefer:


 * First Middle Last (YOB-YOD)


 * First names, of course we need.
 * Middle names can be optional, but I would agree with just keeping it to their common middle name. We don't really have to place their entire middle name, like Waren G. Harding's.
 * Surnames, of course we need.
 * Birth year and death year help differentiate beween people with the same name. As do middle names. If a person is living only birth year should be shown. Not John Smith (1950-?) or John Smith (1950-) or John Smith (1950-Living). Just John Smith (1950).

- AMK152 (talk • contribs) 21:33, 8 July 2008 (UTC)

AMK152's proposal
I would like to at least propose something that can give us a start.

This is my proposed policy:


 * Format: "Name (YOB-YOD)"
 * For "Name," it contains at least First and Last Name (example: John Smith (1900-1985)
 * If First or Last Name is unknown, use "Unknown" instead of leaving it blank or useing a "?" mark. (Example: John Unknown (1900-1985) or Unknown Smith (1900-1985)
 * Surname must be maiden name, not married name.
 * Middle Name is optional. (example: John Isaac Smith (1900-1985)
 * Roman Numerals can be included if the contributors agree (example: John Isaac Smith IV (1900-1985) or John Smith IV (1900-1985)
 * Reccommend use of only one middle name (example: use "John Isaac Smith (1900-1985) instead of John Isaac Bartholomew Robert Smith (1900-1985)
 * Use "?" if year of birth or death is unknown, don't use "unk" or "unknown"
 * Use "c" if year of birth or death is approx., don't use "c." or "abt" or "about"
 * Use "bef" if year of birth or death is before the indicated year.
 * Use "aft" if year of birth or death is after the indicated year.
 * For living individuals, use only birth year in parentheses.

Basically, these have been around and used traditionally, but I suggest we at least make it official. - AMK152 (talk • contribs) 21:33, 8 July 2008 (UTC)


 * All very reasonable. However, I would except the following:
 * Roman Numerals can be included--wouldn't recommending setting this up as a standard. Use of Roman numerals is really a "by-name", which is fairly arbitrary, and creates similar problems to the use of Jr. and Sr.  If someone REALLY wants to use them, then setting up a redirect from the "by-name" to the formal name would solve the problem. That way you could use the by name in an article, but still link to the formal name.


 * For living individuals, use only birth year in parentheses. A) I wouldn't put in a living person in the first place, and B) if it were done I'd go with (Living). Bill Willis 12:57, 9 July 2008 (UTC)


 * I agree with the Roman numerals part. Thing is, articles of living people have been created. People have created their own article including their birth year by choice. I say include the birth year, if the person wants it provided. If the individual does not provide or does not wish to provide their birth year, then they can use the (Living). That can be used by people who don't want to reveal their birth year. However, I do worry about a lot of the same articles that read (John Smith (Living) or even Living Smith (Living). Plus, notable individuals have their birth year revealed as they are politicians, celebrities, rich people, etc. (Such as George W. Bush (1946) or Jessica Ann Simpson (1980) or Donald John Trump (1946)). The (Living) can be used when a living person's birth year is nto specified or the person doesn't want it specified. - AMK152 (talk • contribs) 13:29, 9 July 2008 (UTC)

Is there any further news on this proposal? I'm new here, and Genealogy:Page_names is not as clear as it could be. I agree with the above proposal, but really I think it's more important to have some solid guideline to which people can work, and spell it out very clearly. &mdash; Sam Wilson ( Talk &bull; Contribs ) &hellip; 01:04, 12 October 2008 (UTC)


 * Sam, the above "Proposal" is what most of the recent active contributors are using for most pages. Details of how many middle names and where the Roman numerals go are fairly insignificant in the overall scheme and can be personal preference if there is only one contributor-relative (because they are likely to involve unusual names that will appear in the same section of any automated listing and thus be quickly merged if there are in fact duplicates). We overcame the Google-search problem by introducing the simple name idea that produces a page such as Mary Brown for the search engines (and internal searches) to find; users are welcome to create such a simple page linking to each full-name page using the explanatory page for which the short-cut is hndis (which is short for "human disambiguation" and is a term borrowed from Wikipedia but used slightly differently here). If Genealogy:Page_names is not improved/updated soon, give me a reminder! Robin Patterson 02:35, 12 October 2008 (UTC)

Okay, I've melded the above points into that page, and cleaned the whole thing up a fair bit. I've tried to keep every point and example that was there previously, but just make them clearer. What do you think? (I'll post something to the talk page there, too.) &mdash; Sam Wilson ( Talk &bull; Contribs ) &hellip; 06:30, 12 October 2008 (UTC)

I agree with the above proposal and am making changes to the few pages I have created that do not follow it (things like using "?" for and unknown name instead of "Unknown"). I do not understand why we would use Roman numerals and not Senior or Junior when they are more common (as far as I know). I am also hesitant to add new pages or upload my comparable modest 4,000 plus GEDCOM until this is resolved.Bill H 15:49, 12 April 2009 (UTC)

Needs more certainty for GEDCOM upload
Material from Gedcoms needs to be formed into a page name (and distributed elsewhere in the article and/or its info page). Brian Yap (User:Yewenyi) was doing that, and any usable variation of his program will do the same. The program then checks whether there is a matching existing page. I'm not sure whether it will be able to recognize only a precise match or can raise the question if certain elements match; the latter would be preferable. With GEDCOM upload seeming much closer this month, we should firm up the standard that the program is to follow.

— Robin Patterson (Talk) 11:51, 6 April 2009 (UTC)

Checking for duplicates
Our existing 20,000 person pages are mostly (though far from overwhelmingly) "Firstname Middle-if-known Surname (YOB-YOD)". It may be possible for the program to ignore middle names in its first check on whether a new person from a GEDCOM is a duplicate. That would throw up a larger number that might be duplicates, where one contributor omitted a middle name but the other included it. Better to have two similar persons put on a single page for examination than to have them treated as different if they might not be. Human inspection can then sort them out at leisure.

— Robin Patterson (Talk) 11:51, 6 April 2009 (UTC)

Middle names
Whether or not the program can ignore middle names as above, I'm inclined to recommend that they not be part of future page names; for at least three reasons: The full name, titles, etc, etc, will appear very close to the top of the article for quick visual checking in case the disambiguation (hndis) page does not have enough detail.
 * Reduce potential programmatic duplication
 * Raise Google rank
 * Match Wikipedia

— Robin Patterson (Talk) 11:51, 6 April 2009 (UTC)

Years
Similarly with death years (and maybe even birth years??) unless there's fairly good documentary evidence? We have several hundred pages that have only a birth year; so omitting all of them in future would not be totally radical. We have even more that have a question-mark for birth year and/or death year; nothing lost if those "years" get deleted. Wikipedia generally has no dates in person-article names. Several times, the value of having the exact same name as on Wikipedia has been recommended (because of great time-savings in view of our extensive use of Wikipedia text). To require all GEDCOM-derived page names to have no year provision would not be totally radical and would have some advantages as above:
 * Reduce potential programmatic duplication
 * Raise Google rank
 * Match Wikipedia

— Robin Patterson (Talk) 11:51, 6 April 2009 (UTC)

Compensatory distinguishing text
Wikipedia has a variety of methods, such as "(politician)". Where we follow Wikipedia we will get that anyway. Why not use it as one of several optional distinguishers in the future?

— Robin Patterson (Talk) 11:51, 6 April 2009 (UTC)

I like (what I take to be) the present standard: Full name (YoB-YoD), so I don't like your proposal. I think "Raise Google rank" is "the tail wagging the dog", and anyway I usually use full names with a Google search as the "First-Name Last-Name" search usually gives too many irrelevant hits. Thurstan 12:02, 6 April 2009 (UTC)
 * I agree with Thurstan. - AMK152 (talk • contribs) 01:57, 9 April 2009 (UTC)

There is another issue which hasn't been mentioned which I see occasionally, which is spurious links: both Robert II, King of Scotland (1316-1390) and James II, King of Scotland (1430-1460) are listed as having a daughter named Margaret (Stewart), about whom nothing more is documented. But when these two pages were created, their child lists points to Margaret Stewart (?-?), which used to be (before I changed it) a redirect to Margaret Stewart (1206-1255), a totally different woman. So if we start giving people pages "generic" names, we are going to get these spurious links. I don't think I should have to invent "Compensatory distinguishing text" for people that I have no information about (and I don't plan to name the link Margaret Stewart daughter of James II, King of Scotland, though perhaps that is what I have to do, in view of my next parenthetical). (My philosphy is that if I have nothing else to say about a child, in particular for people who died in infancy, then they don't need their own page. So I would see those "Margaret Stewart (?-?)" links as permanently red) Thurstan 04:47, 9 April 2009 (UTC)

And the same occurs with parents: if I know that Mary Smith's father was John (because that's what her death registration says, so it could be doubted), I would like to show it as John Smith (?-?) and never have it point to anything. If I get more info about Mary's dad, I should be able to fill in at least one date, change the link, and then create a page with a less generic name. Thurstan 04:51, 9 April 2009 (UTC)


 * Good points. I suggest that if no dates are known, we just keep it as "John Smith" or "Margaret Stewart." This way, the links will go to disambiguation pages; If such a page for a person exists, the link can be changed. If not, we know that that particular person does not have an article yet. And just as you said, don't create the article if hardly anything is known. Just wait until there is more info. - AMK152 (talk • contribs) 04:57, 9 April 2009 (UTC)

Google rank
Genealogy is 48 on a Google for "Richard Tol", which is pretty good.

But, if you want to improve that, two things matter for Google. First, what do people look for? "William I King of England" ranks 9th, while "William the Conqueror" ranks 50+. This is partly because our page name refers to the king, not the conqueror.

Second, outside links matter, which is why I put two templates on Wikipedia that refer back to us. Rtol 12:20, 6 April 2009 (UTC)

I prefer using William I, King of England. There are many names we could use. I went to Google and found the following:


 * People seeking genealogical information on William who get Familypedia:
 * "William the Conqueror genealogy" ranks 6th on Google.
 * "William I, King of England genealogy" ranks 1st on Google.


 * Searching "William I, King of England" on Google brings Wikipedia's article up 1st.
 * Searching "William the Conqueror" on Google brings Wikipedia's article up 1st.


 * Actually, whenever I want to search for something on Google, typically the first result is Wikipedia. Just like Rtol said, link Familypedia's article from Wikipedia. Eventually, we could have as many as thousands of links to Familypedia all across Wikipedia. And just think at how many people use Wikipedia. It's ranked one of the top visited sites on the Internet. Links on Wikipedia itself will bring in people, especially as Familypedia grows. - AMK152 (talk • contribs) 01:57, 9 April 2009 (UTC)