Familypedia
Register
No edit summary
No edit summary
Line 78: Line 78:
   
   
One of the benefits of doing this encoding is that family trees will now automatically update. No more specifying all the levels of the tree, or searching and updating all the fricking trees that might be affected by a newly found ancestor, or worse- fixing a mistaken parentage- an error that cascades to many cells of a tree. Since the entire tree can be inferred from the head node, all you have to do is simply plop a single Ahnentafel template with no parameters on the page and you are done. You need not specify anything since it will assume the name of the article as the head node. <small><font color=gray>This will all be very simple, that is, unless you have moved the article 12 times because you keep renaming it because you felt like overspecifying middle names, death dates etc. Whatever- each time you fiddle with stuff this way, you must now move the metadata page too. To each his own. </font></small>
+
One of the benefits of doing this encoding is that family trees will now automatically update, and with richer info. No more specifying all the levels of the tree, or searching and updating all the fricking trees that might be affected by a newly found ancestor, or worse- fixing a mistaken parentage- an error that cascades to many cells of a tree. Since the entire tree can be inferred from the head node, all you have to do is simply plop a single Ahnentafel template with no parameters on the page and you are done. You need not specify anything since it will assume the name of the article as the head node. <small><font color=gray>This will all be very simple, that is, unless you have moved the article 12 times because you keep renaming it because you felt like overspecifying middle names, death dates etc. Whatever- each time you fiddle with stuff this way, you must now move the metadata page too. To each his own. </font></small>
   
 
Yes folks, we get rich trees without all the bother of cutting and pasting repetitious info- which of course no one will do unless they are super dedicated to a particular ancestor.
{{Ancestors2|William I, King of England (1027-1087)}}
+
{{ancestors2|William I, King of England (1027-1087)|display=full}}
   
 
Naturally, you can specify the start of the tree so that you can display the tree of any ancestor from another article. Eg. the example above was generated with:
 
Naturally, you can specify the start of the tree so that you can display the tree of any ancestor from another article. Eg. the example above was generated with:
<nowiki>{{Ancestors2|William I, King of England (1027-1087)}}</nowiki>
+
<nowiki>{{Ancestors2|William I, King of England (1027-1087|display=full)}}</nowiki>
   
  +
By omitting the display=full option, you get a compact tree without pictures and birth/death dates.
   
 
'''Globalization:''' Reuse means that a lot of the drudgery of keeping various language versions in sync will now be removed. EG. slip the lang parameter in there, and you have:
 
'''Globalization:''' Reuse means that a lot of the drudgery of keeping various language versions in sync will now be removed. EG. slip the lang parameter in there, and you have:
Line 90: Line 92:
 
{{Ancestors2|William I, King of England (1027-1087)|lang=ru}}
 
{{Ancestors2|William I, King of England (1027-1087)|lang=ru}}
   
'''Richer Trees:''' But wait, if you call now, we will throw in 2 Ginsu knives. Yes folks, we get rich trees without all the bother of cutting and pasting repetitious info- which of course no one will do unless they are super dedicated to a particular ancestor.
 
{{ancestors2|William I, King of England (1027-1087)|display=full}}
 
   
 
For the purposes of these examples, I only supported the 2 level tree. I will fix the 6 level one in due course.
 
For the purposes of these examples, I only supported the 2 level tree. I will fix the 6 level one in due course.

Revision as of 19:00, 6 October 2007

Forums: Index > Watercooler > How we encode our data


This is about Metadata. Feel free to skip the theory to get to the meat: #Using /info subpages


There are some in the wiki and internet community who advocate representation of information in a way that computers can evaluate. This movement is of relevance to genealogy enthusiasts since, unlike most genealogy programs, a wiki has none of the "database" like features of reusing information. Using genealogy programs and "databasey" genealogy sites like Genealogics, when you change a birthdate for a subject, automatically every page using that birthdate gets the change. Such information sharing makes it easier to keep the database tidy. Beyond this housekeeping advantage, high tech advantages of declaring such information formally include the ability to make inferences based on such representation of information. Genealogy information is more simple than more general domains of information representation and so problems such as identifying inconsistencies does not involve complicated logic. EG.

  1. Joe was married to Mary.
    • The date of this event was X.
  2. Child Y 's mother was Mary.
    • The source of this idea is A.
  3. Child Y's mother was Jane.
    • The source of this idea is B.

Situations with conflicting information such as item #2's version of the truth 2 and item 3's alternate view are well known to anyone dabbling even briefly with genealogy research. Strict genealogy systems sometimes have problems representing inconsistent or ambiguous information, but Wikis have no such constraints. The two approaches are not mutually exclusive. In some future genealogy wikia, such ambiguous and contradictory information can alter the probablistic confidence of particular views of a family history using stuff like Bayesian inference.


Gedcom represents some of this information, but LDS's goal[1] for Gedcom was that it be a format for exporting or importing data to various programs or internet sites, nothing more. Gedcom 6.0 (XML) format continues to confine itself to that goal, and explicitly states this in it's draft spec.[1]


In the Wikicommunity, many infobox templates are recording information conforming to the HCard Microformat. This sort of encoding can potentially support a superset of the information that Gedcom 6.0's XML format will support. If we wish to follow that sort of direction, the Gedcom5.5 java program that converts to Gedcom6.0 like XML or alternatively to Resource Description Framework (RDF) format might be of interest. Further information on the program and discussion of the issues for such semantic representation of genealogical information may be found on Jay Askren's site.


Another method of encoding metadata for a person has been advanced by the Biography project. They use "Persondata" information contained in commented text placed at the end of the article. The advantage of this is that it is unobtrusive- no one is required to use infoboxes for their articles. The disadvantage is that if people don't see the information in the resulting article, there is no incentive to keep the information valid with respect to other information in the article. Push comes to shove, I think that sooner or later we will have some kind of standard infobox to normalize the appearance of articles.


What does the hassle of conforming to such templates or supporting these hidden blocks of information buy us? Well, brushing aside all the gee whiz applications of semantic databases, our genealogy wikia would eventually benefit from very practical features, such as the simple idea that it allows information to be shared between articles. Meta's Semantic Mediawiki extension supports encoding data in a central way that can be accessed anywhere in the wiki. It looks like normal wikitext. For example, a person article for Joseph Hester might have the text:

Joseph's parents were [[father is::Elias Hester (c1832]]. 

Now, any time this information is updated, everyone that wants the change can get it. EG. I have a family tree, and for one of the cells I can hardcode the Elias Hester or I simply put

[[father of::Joseph Hester (c1858)| ]]

Some of this stuff is working today, (see example for california at ontology semantic wiki page [2]). When it matures, it is surely something that future contributors to Genealogy wikia will want to begin to use. Note that any it is just another wikitext operator, and this doesn't impose any radical demands on authors. It can be ignored by the majority of contributors, but I expect will gradually gain many converts simply due to time savings. It can be used in an evolutionary way, and I expect the transition will be fairly gradual, with a mixture of usage of hardcoding versus re-using data. This will suit wikia managers very well, because the server loading created by complex templates using such queries are not well understood. It could be that caching will make it a non issue, but note that data dependencies are multiplied. Change the data declaration father is:: relation for William the conquerer, and you could potentially invalidate the cached pages of hundreds and hundreds of pages using this information. It's also impossible to predict what the issues are with vandals. The same issue arose when wikipedia first started, (the objection was that allowing users great power will mean they will abuse it)- come to think of it, I think the nobility said the same thing about allowing the rabble to vote. Anyhow, a gradual transition allows everyone to learn and adapt.


Other explorations of interest:

  • Microformats and genealogy information [3]
  • Inline queries using Semantic mediawiki extension [4]
  • Meta's article on the extension: Semantic MediaWiki


For the forseeable future, we cannot predict how the data representation formats will evolve, and can only adapt along with them. At some point, it is inevitable that Genealogy wikia will have a data mass sufficent to earn us a seat at the table so that we may positively influence such evolution.


For the near term, we should encourage folks to encode information using standard templates such as Template:Person. This will help the future upgrade of the data to representations such as the above.


Secondly, an important point was made by Askin on his page. The fundamental issue with data interchange is making sure that a that the Person A in an input file corresponds to the Person B with the same name, birthdate, birth location but different parent than person A. Jay Askin noted that globally unique identifiers have been in use by LDS for some time, to deal with this and considered the use of the AFNs (ancestry file numbers) to deal with that issue. The problem he noted is that the mechanism for creating new ones is controlled by the LDS organization, and it is not clear how open that process is to other contributors. Perhaps it is no big deal- that if LDS would parcel off authority for ranges of numbers and trust other organizations (eg genealogy wikia) to see that they are being used properly, then that seems like it would be acceptable.


Another proposal to carry our own global unique identifiers (GUIDs) (pronounced gooeed). Data import programs would specify the AFNs if they are passed in a gedcom file, but as part of the import for all new records we would also would specify our own Unique identifiers. EG. When we start exporting data from genealogy wikia we make a pass over all articles and generate GUIDs for them using something like a GUID from a site like this. And we just periodically update all new pages with the persondata (or alternatively template:person) UID field with these GUIDS. A bot also would periodically resurrect any inadvertently deleted GUIDs. These GUIDs are not typically displayed, but used for matching when importing/exporting data and when looking up data.


Which brings me to why I am thinking about any of this now. It is my intention to add an AFN and a GUID field to /info subpages of articles. This represents information that is a superset of information in Template:Person. It supports wikipedia's Hcard metadata as well as the Persondata metadata that the Wikipedia Biography project is using. This is good stuff for drawing in new folks because google and other searchers recognize these fields.

Using /info subpages

This metadata approach for Genealogy allows authors to re-use data now, without any SQL queries or waiting for some unknown date when Semantic wiki extensions will arrive.

Beginning today, it now is possible to do queries. EG:

{{get|William I, King of England (1027-1087)|key=birthdate}}
produces: Template:Get

Similarly,

Father is:Template:Get
Image is:[[Image:Template:Get|100px]]

Of course, if the "Get" occurs in the william article, the query is compact:

{{get|key=father}}
gives:
  • Template:Get

This is not much longer syntax than what is required for the semantic wiki wikitext, but semantic wikitext will be better for many reasons, not the least of which is that it is definately more simple and natural to specify. More importantly, such a future approach will be more robust. Currently, if someone moves an article, they will have to remember to move the /info page as well. The advantage of /info subpages is that they deliver the bacon today, and support all the goodies of the microformats as well as the Persondata initiative. In my opinion, the GEDCOM bot should produce /info oriented pages. That means that this will not just be some rare thing, but a substantial number of genealogy's pages could be using this.


Want to kick the tires? Take a look at the William the conquerer example and simply create a /info subpage for any article. Or play with William's /info page. Give it a whirl.


One of the benefits of doing this encoding is that family trees will now automatically update, and with richer info. No more specifying all the levels of the tree, or searching and updating all the fricking trees that might be affected by a newly found ancestor, or worse- fixing a mistaken parentage- an error that cascades to many cells of a tree. Since the entire tree can be inferred from the head node, all you have to do is simply plop a single Ahnentafel template with no parameters on the page and you are done. You need not specify anything since it will assume the name of the article as the head node. This will all be very simple, that is, unless you have moved the article 12 times because you keep renaming it because you felt like overspecifying middle names, death dates etc. Whatever- each time you fiddle with stuff this way, you must now move the metadata page too. To each his own.

Yes folks, we get rich trees without all the bother of cutting and pasting repetitious info- which of course no one will do unless they are super dedicated to a particular ancestor. Template:Ancestors2

Naturally, you can specify the start of the tree so that you can display the tree of any ancestor from another article. Eg. the example above was generated with:

{{Ancestors2|William I, King of England (1027-1087|display=full)}}

By omitting the display=full option, you get a compact tree without pictures and birth/death dates.

Globalization: Reuse means that a lot of the drudgery of keeping various language versions in sync will now be removed. EG. slip the lang parameter in there, and you have:

Template:Ancestors2


For the purposes of these examples, I only supported the 2 level tree. I will fix the 6 level one in due course.

References Indicating unique identifiers used by other sites is the perfect way of inserting a friendly link to our peer sites on the web. We are all working to preserve our ancestors history- by standing on each others shoulders we work together to that goal. Plus- Google values our hits higher the more links we have to high value sites. So insert this template into your notes or links section:

{{get references}}
for William example gives:

Template:Get references Don't worry. Be happy.

~ Phlox 02:48, 5 October 2007 (UTC)

Notes

  1. ^ The church of Latter Day Saints (LDS) authored the Gedcom spec. It has been a great contribution to the community.

Robin's first response

Haven't read every word above, but I think I've read enough to be highly appreciative. I'm somewhat familiar with GEDCOM format and with XML. Your ideas seem to be the right way to go, especially if it can remain compatible with larger organizations such as WP and LDS. My guess is that Bill will like it too.

You mentioned the GEDCOM bot - I hope that's Brian's Help:Loading Gedcoms program and I hope you can

  1. make it more usable (or the instructions clearer) for those of us who are not sure whether we could import or use all the necessary Java etc
  2. tweak it so that it produces pages that conform with your above ideas or any variations of them that might achieve consensus

Robin Patterson 11:21, 5 October 2007 (UTC)

response re Gedcom bot on Forum:Gedcom bot -~ Phlox 23:53, 5 October 2007 (UTC)