User:Phlox/2009 05 notes

11 May

 * Form junk
 * do extended form
 * preloads
 * short descr - set wide 100%
 * article rating
 * gallery section derived from images
 * building street line too big
 * convert Form long over to international strings
 * propagate updates to the subforms
 * need the international strings fleshed out more completely.
 * #if gallery, br clear all before gallery
 * general sources- where to put the footnote?
 * boilerplate text for empty (new article)
 * siblings- maparray break this field
 * Form for image fields
 * Move over international to mediawiki namespace

random accumulating tasks

 * Redundant copies of data problem: Maybe mimic the "shipping address same as billing address" pattern for forms. Add a copy from spouse for children, wedding details.  This might just display the information and grey the box rather than redundantly copy.  Maybe extend this for siblings (copy from parents) and do this for smwbasepage.  This smwbasepage idea as properties held in common.
 * What is this properties settings procedure. If I set a value of a property with multiple items (eg street addresses), are all of the other previously written ones erased when the page is saved?  How then with ancestores do I iteratvely build up a list?  templates with parameters scheme?  or string, then dump out to properties?
 * YDNA form- low priority- not too many people doing this yet.
 * set facts from info- do
 * do a quick check on countries from place fields
 * get children from other spouse check
 * could set up source field as pairs, the first is the cite parameter name, the second is the value. eg cite|title=foo|author=bar = title~foo~author~bar
 * set general info missing properties:
 * fame 1-10, "notability", so that one could conceivably find notable ancestors.
 * female line mechanism? eg- the eve line thing.
 * images, documents, other files.
 * tags
 * run a bot to propagate all event properties uniformly
 * other languages Prototype (a long form?)  Is a lang just a different kind of version page?
 * other stuff
 * multimedia
 * wikipedia references
 * descriptions in alternate languages
 * wedding2-6
 * occupations1-4?
 * education1-4?
 * emigration? or immigration?
 * residences? or mixed with Census event subpage?

tasks unrelated to smw

 * if it does exist, subst out the ifexist call, or set a maintenance property ifexists not needed with link, and bot eliminate them.

PhloxBot

 * create info page mirror, cleaned of any embedded wikitext.
 * upgrade any dates with embedded aft, before, c1530 to estimate field and proper year, month etc.
 * Kentucky or Tennessee -> Kentucky;Tennessee
 * extract formated dates from burial, wedding, and baptism.
 * transfer place info to the street address field delimited by semicolons. (explicitly do runs scanning for known states and countries).
 * Clean sources field, replacing * and with semicolons.

other maintenance sweeps

 * create redlinked children and parent nodes.
 * correct misspellings of states

junk these

 * general info, general sources
 * blank: Template:Get grandparents & subs

wikia/ semantic engineer info requests

 * investigate whether there is a switch to default turn of bottom of page rdf/ facts field.
 * "$smwgInlineErrors = false;" so no type errors on blank fields
 * Are there metrics for measuring server load for a page, eg: NewPP limit report?
 * How do I check what the list of environment variables ($smwgInlineErrors) are?
 * What is the expansion limit for client side autocomplete? The forms docs say the default is 1000.  Is that 1000 strings for each field, or 1000 for the entire page?
 * Bug autocomplete (top of screen)?
 * Is there a standard way to stick wikitext in a text field?
 * Images stuck in properties of type page attempt display. Isn't there some way to indicate the right hand operands?  EG image size, no display (prefix with :), etc?  Currently, the default is |frameless|border|text-top]].

random bugs

 * templates that mess up wiki banner
 * template:documentation
 * navbox (probably tbar)

unrelated to smw

 * wikia header bugs- SMWintro- if no ::: indent, then ads display fine. With :::, then you must put a new line between the no include and the documentation template call, or the ads go wonkey.  Who knows why- the divs are balanced in the documentation template, so it isn't that.

Model

 * approach1: stay with the subpages thing, (this works because move takes all subpages).
 * In place of a single /info page, there are multiple /theory or /version pages. These correspond to an alternate theory of parentage for controversial individuals.  They also more commonly correspond to alternate Gedcom records, many of which are for the most part redundant repetitions of the same material.  Merging them becomes problematic since they may have been embellished on with some useful information prior to being re published with a new unique gedcom identifier.
 * To represent a gedcom file, a version would have links directly to the /version page of children, wife etc. Later, someone with a genealogy bot tool would be able to make assisted fixups of these records, collapsing them as possible.  The gedcom ID, and possibly a  of the key features of the file would be stored so that further copies would not be redundantly collected  over and over.
 * Theory files would be directly queried instead of the main article page.
 * Every complex data item becomes a subpage, eg: events, residences, marriages, children by marriage (?? actually, it may best be an #ask for Father=X, Mother=Y, but it is unclear- their are complexities to doing it this way- see issues section.) Note: this will generate a lot of clutter for an article.  No better way to do it because we don't have aggregated objects (effectively, there are no n-ary relations).
 * Actually, there are n-ary objects "many-valued properties". However, the current implementation has some limitations that make them unacceptable:
 * You cannot use the special Allows value property to limit the values of any element of a many-valued property.
 * You cannot use the special Display units property to control how a specific element appears.
 * You cannot set the layout of the values; they will always appear as a comma-separated list.
 * You cannot create a timeline query of many-valued properties.


 * variant to 1: All complex objects are subpages, but instead of putting the fields on an info page, put them directly on the main page.
 * objection- this disallows the alternate theory approach, where each /theoryN page is switchable by preference.
 * approach 2: KISS (keep it simple, stupid). The main page carries the properties for the individual. These are the ones accessed
 * These may or may not be copied from the theory/ versions subpages but in any case the field names would be identical.
 * Note, the main page is also the English version. Issue for multilingualism?
 * It doesn't really matter where the properties are stored if they are accessed with a get function. The get function could access what the dominant theory is etc.  But also the theory/ version could be explicitly stated.
 * At first glance this is the most untidy engineering approach. Much nicer if each aggregate data item is stored in a separate page, then common fields like date are not redundantly replicated (birth date, death date, Occupation1 date,...) but are instead are a generic date property of a birth, death, or occupation subpage.
 * Problem 1: canonically correct object orientation introduces complexity for template and tool writers, and this is not good given that many are enthusiast/hobbyists. Even simple queries are complex, for instance they will have to do indirection to get an any value.  Eg: (#ask (#ask for the birth page name) property from the birth desired)
 * Problem 2: bot tools will find it much easier to deal with simple data models (one page per individual), rather than have to do IO to get at a potentially huge list of separate pages.
 * Problem 3: Competing theories on different facts is an additional dimension of permutations. If our model is a separate subpage for the death event, then what happens when we have an alternate theory of the death event?  Do we create a theory subpage of that death event supbage?  Whoa.
 * Problem 4: Our model for multilingual is to store unique items in a /(lang prefix) subpage. This means that each of the event subpages would also have to have a language subpage.  Not pretty.

Issues

 * No watchlist notification Too much dynamic updating could present some undesirable side effects. For example, one way of doing children is to not have the parent article declare the children and the children redundantly declare the parent. Instead, have the declaration be in a single place- from the child article.  That way you can query for has father and has mother and get the list for one set of children by one mother, and so on for each mother.  The problem is that the article changes with no watchlist notification, so the user may not be aware that cherished articles are being changed.
 * Interaction with theories. Given the child parent model example above, does the child link to the basepage, or to one of the theory subpages?  If theory page, what if some are dissimilar in unimportant respects, and the child is of theory 2, 5 and 7 versions, but not the others?
 * Maybe it is a good idea to redundantly link these, and fix inconsistencies via manually operated bot. EG: Father theory1 declares who it thinks the children are for each union, and theory2 declares another list.  From the child side, they declare the father BASEPAGENAME from each theory page, enumerating any of the father theories that are excepted.
 * Social factors play into this- you want to have a site that is welcoming, but if you were ever in a bar with military guys, folks will quickly come to blows about whether one guy is being a phoney and claiming he was in a unit or participated in some action that he was nowhere near. So you have all these different versions from different perspectives.  You have the Collective event stored in the Event: pseudo space, then you have all these assertions coming in from all these different articles asserting they were there.  The military history buffs maintain the event integrity by listing the alternate theories etc, but have no interest in mediating these disputes.  So the article declares the dominant theory of who it thinks were at the site, but also does an #ask for all the person Theory pages that claim the person was there.  A bot shows where there are inconsistencies that the contributor may or may not want to resolve.  The point is we make the addition of content relatively painless.  Resolving such problems can be deferred if we have a good manually assisted edit tool.
 * The alternative is to have everything connected, and the military article only knows who participated by the soldiers who assert it. Similarly, the guy interested in just adding the article on his grandfather is not required to create a military battle article with all the particulars of that.  He doesn't want to declare the battle article  just to record the information he wants in his grandfather's article.  If everything is connected, a lot of this work can't be deferred.  In addition, the newbie could be subjected to all sorts of community pressure that he is mucking up articles on collective events (military battle) with unfounded assertions (that his great grandfather single handedly took the hill and saved his unit from certain annihilation).
 * Coordination with RDF- we will use accepted genealogy RDF and other data model ontologies where possible, but our focus is not on solving the database heterogeneity problems in the genealogy community. It's a very hard problem generic to all databases.
 * Background: accurate mapping of fields in databases is a hard problem that has not been solved even in where the database is used by the same company, using the same software with the same syntax and the same schemas. For instance, the interpretation of the fields is oftentimes dissimilar between operating groups and so analysis is impossible because apples to apples comparisons are impossible- eg: what is/ is not included in net revenue figures?  One operating takes some operating expenses out of their net, others exclude these costs in order to inflate their apparent success in generating revenue.  The field is the same, the entire record is the same, the software is the same, yet they cannot be meaningfully compared.

My thinking on this revolves around the question of which approach gets people to collaborate on high value common articles, rather than multiple essentially "owned" private copies of the same individual that from the POV of the contributor there is in fact disincentives to merge/ collaborate on. Some of the current ideas are recorded [Forum:Google_rank#Dutchies|here]], but implementation of gedcom import has not begun so it is early.
 * one gdbi sourceforge post on the issue

Switchable views

 * Scope of investigation: Limited. Investigate this only to the extent of determining that we are not going to paint ourselves into a corner for future options.  Find out how I would likely do it in the future.
 * The solution probably has to do with the way we do user preferences, and so the only way we have of doing that sort of thing is through css and .js - like how we do the date formatting thing. We set a state, the css sets the formatting based on the state.
 * Basic technical field of battle challenge: Due to sever loading issues, you can't affect template code or otherwise generate custom articles per user. One way is to generate all views, and unhide the one that the user prefers.  If the user disagrees with the dominant theory, they put in their preference of theory subpage as a property of the basepagename. The value is expressed in the html, and the client side .js code turns on or off the rendering of the data depending on which theory is preferred.  That is one way of doing it.
 * The template emits span display none for the non dominant theories. So folks with not logged in/ without accounts see the dominant view.  classes are assigned to these spans so they alternately have the display non overridden, and the dominant spans hidden.  This is the work of the .js by consulting a hidden list of which theories are preferred by which users/ club names.  Your preferences might state- show me the dominant view except in cases where this list of people think it should be something else: User1; My User; Project name; local genealogy group name;  These groups might arbitrate among themselves what the more correct theory is.  So subgroup collaboration can express an effective minority position.

Multilingual
The forms code has some autotranslate thing that I didn't read up on. Have no idea what they are doing, but in any case, our problem is bigger, because now we also must be not just with respect to the user's theory preference, but their language preference. Whoo boy.
 * approach1: Main page alternate languages can be base pages (eg for alternate fonts- chinese, greek, cyrillic), but they point to the subpages of the english BASEPAGE, using the language prefix as a subpage eg /fr. Example info page approach for the obama article.

RDF Ontologies

 * GEDCOM RDF mapping

SMW stuff that doesn't work

 * Text input allows text values from a form that will mess up display of the page
 * No way to input wikitext into a field.
 * Hackaround: It would be possible to hack set values eg: ((wp>USS Monitor))could be parsed and could be displayed.
 * The docs say the following doesn't work, so don't use examples from some sites that may use them:
 * Inverse properties, eg siblings
 * Domain and range restrictions, eg Father
 * Number restrictions and functional properties
 * Transitivity (?)
 * date type requires full date. This is lame because oftentimes we have year only, year month, or circa type dates. (hackaround is to offer year, month boxes for partial.)
 * Forms don't support all table options. eg. background color style, see Form:Demo1, the place subbox should be background light green.  Possible hackaround is to use html td code, and set the css fieldset and legend backgrounds to transparent.  Fix is to look at what Yaron is doing in the php, but I won't get to that for ages, and this is sort of cosmetic stuff anyway.  There are complex browser issues since IE apparently does things differently and special case code is necessary (what a shock). For instance here.

Missing special pages

 * No upload ontology
 * no upload vocabulary

Bug list

 * Bug list


 * Nary does work? Mentions of this in Bug11411 can't have an n-ary relation composed of an enumerated type

Bugs I ran into

 * textarea doesn't pay attention to the size field in order to clip it to a smaller size. It pushes the layout past the page width boundaries unless great care and table gymnastics are resorted to.  workaround is to assume the default size and to work around it with cells.  See long form families table especially.  The sources and notes text area fields interact severely with the side picture without great care.
 * complex layout causes form to "forget" field attributes EG: Partial form set death does not pick up default values for death date-approx or calendar. Works fine on main form.  Image width has correct values set in property, but if it does not redeclare it's property, it will not display the pulldown list.
 * workaround: declare property= on the field statements
 * declare does not work if there are spaces after (perhaps before) the equal sign.

PHP fatal error in /usr/wikia/source/releases_200905.1/extensions/wikia/WikiaSpecialUploadInfo/WikiaSpecialUploadInfo.php line 19: Call to a member function getTitle on a non-object
 * date displays time 0:00:00 on second, third, forth.. Forms when editing with Form:person long form. If the page is instead edited with form:set death or form:set wedding1, then the value is set properly.
 * workaround: clip the minutes seconds using the #time function in the set templates. This is a temporary patch.  #time needs a lot of fixup to handle single digit years etc.  The real solution it so fix whatever the extension is doing.  May be  a bug, or caused by it getting confused by some of the table formatting stuff I put in the form to make it look less voluminous.
 * Upload bug when loading geer jpg

Limitations & workarounds
'The following section is obsolete after the discovery of how to use stringfunctions to crack the page value so that a string may be extracted. See "Major discovery" below.'
 * As of the time of this writing, for autocompletion to work, it needs pages, not strings. Setting autocompletion on property= some page property works fine, but not with some string property.  Properties do not pick up subproperties, so it seems to me that categories is the best way to go about this- otherwise you have a huge flat namespace that you have no option of segmenting in the future (eg just fill in with counties in Scotland).
 * Autocompletion is fine locally when there are a small number of values. With large numbers, the page load can be very slow, and you may run into a 1000 item max (not sure this is per page or per field).  Remote autocompletion has no limits, the page loads faster, but the autocomplete is slower.
 * workaround plan: autocompletion on category|remote autocompletion.
 * As of the time of this writing, autocompletion box displays at the upper right corner of the page. It is hidden if the page is scrolled.  This is very bad on a large page.
 * workaround: position all autocompletion fields on the first screen full of data.
 * Do not place free text box at top, move it to end of article. This will make regular wikitext reading of the article disconcerting for typical WP users.
 * recommend users go to smaller forms for most of their editing.
 * If a property is a page, then it can't be used for conditional ifs. If you need it to be a page, to get it's value, you have to store it as a string property. This makes code complicated.  For example, "La Salle county (Texas)" can't be used in a line that displays city, county, state, because you duplicate texas.  So you need to do a La Salle county .  You can make the short name a property of that article, but if the property is a page, you can't do the decoration with square brackets.  Same problem with displaying surname.  The article name as (surname) postpended.  This shouldn't be displayed, but to strip it, you need access to the string value.
 * Never store articles as a pages . Always use strings and use square brackets to display as a link.
 * Counter argument: Properties of type page allow redlinks to go to a default form input. We are doing people that way, so maybe we should do everything that way, and just use the & template to do the dereferencing.


 * My old Info pages stuff is much more efficient than SMW at Query intensive operations. EG, with a version of Jan Willem te Kolstee (1830-1895)], a refresh can take up to 4 minutes and in many cases will time out. Processing tree is exactly the same- the only difference is that it is doing an #ask for a parent instead of a template call to return the parent from the info page.  This uses the SMW version of the showinfo ancestors code.  If you have 4 minutes to blow waiting for a page, click this version of the Jan Kolstee article that uses the SMW code.
 * Workaround: see below.

Weird/ anomalous stuff

 * returns lots of pages. Does it recognize pagename as a key word? or is it undefined when first operand is null?
 * form weird syntax:  in page http://discoursedb.org/w/index.php?title=Form:Author&action=edit
 * "author" is the feeding template name for the form, and first name/ last name are parameters from it.
 * Gotchas
 * if you forget to insert the = after the property name, it will print out the field before the value:
 * If you put a link in a text field, (eg: [http:blah blah] ), the property will not be stored, and subsequent #ask's won't work. This may be true if there is any wikitext in the field.  The template page will display the property set statement, since it is not executed.  Solution may be to urlencode all text fields, then unencode them.
 * When you save a page that has properties, it nukes all previously saved properties for the page. So if you remove a line that set the property, the property won't exist anymore when the page is re-saved.  So on every save, every set property statement must execute with prior (or new) values, or they go away.  Weird.
 * If you put a link in a text field, (eg: [http:blah blah] ), the property will not be stored, and subsequent #ask's won't work. This may be true if there is any wikitext in the field.  The template page will display the property set statement, since it is not executed.  Solution may be to urlencode all text fields, then unencode them.
 * When you save a page that has properties, it nukes all previously saved properties for the page. So if you remove a line that set the property, the property won't exist anymore when the page is re-saved.  So on every save, every set property statement must execute with prior (or new) values, or they go away.  Weird.

Cool stuff

 * Timeline output (for all subprops of event date)

Complex types

 * Marriage eg. Edward Riggs (1589)/Holmes-Riggs
 * residences (big because of censuses)
 * All other events with multiples: Occupations, Education
 * Migration event emigration: (from country, ship, ports), moving: mode of transport, reason, motivation
 * Citations? whoa.  That certainly is complex, but will the user have to create a separate page for every one of these jokers?

Small pages
Something goes against the grain about having all these tiny pages sitting around. The tough thing about these events naming them.
 * 1) They aren't owned by one person- eg the husband. What if it turns out that the name of the person gets changed etc. Do they then have to move 20 of these bittey pages every time they rename?  Hmmm.  Maybe a bot can do this.  And who owns them?  Does the Husband own the marriage event or the wife?  If the naming uses both people's names aren't you just doubling the chances of move due to a changed birth or death date?  OK, maybe these events are owned by one of the parties- doesn't matter who- it's just a unique name.

Implementation

 * Opening via form on redlink- To property "occupation"(s), add value has default form.
 * ??pagename should be a subpage, but user has entered the page name. So they could muck it up. OK.  Maybe they enter a string, then save the form.  On next form load, the template doesn't display the string, and instead creates the real field of type Page, decorating it with the proper prefix.
 * Note that for shared events like marriage or migration, say the husband already created a marriage page. Wife needs to poll the husband to see if a page already exists and link to that first.
 * Maybe there is a smarter form way of doing this.

Pro/Con

 * Pro1: Removes a tremendous amount of clutter from main page.
 * Pro2: Tidies the Property namespace (no occupation8-locality, etc.)
 * Con1:This could potentially disrupt workflow. You have to open a new page to type in marriage info.


 * Conclusion1: Being a subpage doesn't imply anything. It has to go somewhere, and being top level means it will collide with other similar names.  So keep these complex types as a subpage, use some logical naming convention, and call if folks want to get fancier later, they may do so.

Tips

 * to have a query return the page name not as a link, but as a string, add the operand " |link=none " to the query.
 * Major discovery- It is possible to derive the string from a page even if it has been returned as a link. You can use string functions on a returned page link, and it turns out that it is just wikitext for a link with bar and right hand value the same as the left.  So you basically divide the number in half, minus the decoration characters, and you have the substring offset.
 * returns:
 * This means we can use pages with a great deal more freedom. They are never opaque, even redlinks.
 * Namespaces not 0 For pages not in namespace 0 the string is at offset 2, not offset 3 as for ns:0. I don't know why you'd ever want to use type page for an image, because it tries to display it. This might make sense for a wiki with images that are already the desired display size, but they seldom are.  Examining the decoration, I see that they are constrained, so this is ok to use.  The decoration currently is:  |frameless|border|text-top]].


 * Subproperties means you can search for groups. EG if birth date, death date, marriage date are all subprops of event date, then searching on event date picks up hits on all of them.
 * Forms have a preload option
 * There are a set and declare statements:
 * default form for a namespace is set at genealogy:File, genealogy:main and so on...

gedcom

 * very elaborate gedcom origin record
 * with _UID, !LINKS, !OCCUPATION: embedded GEDCOM field names
 * Raw gedcom with sources, evidence/ event types, indirection examples

DNA / YSTR / Haplogroup

 * Family tree projects
 * - Trout

Development sequence

 * First work used subpages to store generic properties. Form:SMW-test3 created an event subpage eg George Spencer Geer (1836)/death
 * Since such compound structs are not possible from the main page, this was the best "clean" approach for data aggregates eg death.date, death.state etc. Although it is best practice from an engineering perspective, and there would have been an economy of property names, it was abandoned for many reasons not the least of which it would have been more difficult for novices to access the values.  The approach inherently requires indirection, and therefore code that references data must do a couple #ask's.  Further discussion may be found above in the Model issues section.
 * Main page forms and partial forms were shown to be able to edit a normal article, with large volumes of properties. Forms were shown to be reformattable for more attractive UIs .  Partial forms were shown to be able to produce bite sized chunks so that the user is not overwhelmed.  New article form "short form" was demo'd for simplicity of initial article creation.
 * Double flushing emerged as a problem. Solution might be to merge the set variables templates into the infobox display.  eg:
 * infobox header,
 * set birth cell template
 * set baptism cell template
 * set death cell template
 * This design would eliminate double flushing, but for items displayed elsewhere on the page (eg gallery section with images from birth, baptism, weddings)- these would require a double flush.
 * Re-ordering: editing with partial templates will reorder the sequence of the infobox cells. EG using form:set death would place the death cell prior to the birth cell.  Bummer.
 * Alternatively, one could tie the setting of elements directly to the display of those elements. So if there were a gallery section, Wedding1 photos would not be set with the wedding1 dates and location, but in the set gallery template.  If everyone agreed to a standard layout, then this might be workable.  From the researcher standpoint, it makes data input workflow more haphazard, since it would not be possible given the current code to display the wedding1 data together, since they would occur in different templates.
 * As of 10 May, the idea is to tolerate the double flushing at least until the reordering problem is addressed.
 * "Everything" Form (Form:Person long form) cannot be burdened with rare and voluminous items (eg occupations 1-8, Military events 1-8, Weddings 1-8, Residences 1-8 and so on). The idea is that for the main events, indicate the first event on the main form and to use partial forms for the overflow.  Less common events like Bar Mitzvah, sealing, adult baptism etc won't have any mention on the main form.
 * can forms be launched from forms?

Field renames/ changes

 * Spouse -> Partner, Children-s1 -> Children-p1 etc per Property talk:Spouse discussion
 * fm-children-S1 fm-children-p1 (no caps in property names)
 * nuke all fm-emigration
 * fm-attendees -> fm-people involved (generic message- must cover cases like remains)
 * fm-migration1 people (if message specific to event is needed)
 * form fields can apply type enforcement ("allows value") if specifically designating a property. This should use the superset property declaration so these may be easily changed in the future.
 * ...radiobutton|property=migration1 date-approx -> becomes date-approx
 * remove superfluous dashes eg date-approx becomes "date approx" (date modifier)
 * Actually- nuke date approx, replace with date modifier.

rejected

 * all events shall now place event at last so that numbers may be postpended. This applies to all events
 * This change is no longer proposed. Reason is ease of use for inline tagging.  See section below
 * wedding1 date becomes date wedding1
 * " birth locality" becomes locality birth
 * Sex -> gender per foaf. Unnecessary.  sex is three letters, and people are used to it.
 * property People depicted-> Depicts people? (FOAF Depicts) depiction? people depicted is natural and will be a synonym of foaf name

additions

 * weight, height, DYS YSTR values each as separate fields.
 * ethnicity?
 * Property:Alternative father1, or Father-a, Father-b etc all as subprop of some "fathers" supercat? Or maybe just father?  (maybe not the latter, since you wouldn't be able to search just on the likely father, which would be the meaning of the father field? maybe not numbers but letters so that dominance is not is implied by numbering.  Do we set Father as dominant theory field or do we have a dominant theory field at all (EG not "father" if there is any controversy.  Father becomes father-a, alternate becomes father-b....) Hmmm.
 * Matrilineal line eg:eve project. Name by oldest known ancestor? or make one up? or base on mitrochondrial number of some sort?

SMW does not mean ancestors are reduced to statistics

 * It is not required to use forms to add structured genealogy data. This information can be added inline, rather than using forms and person infobox.  For an example, see Agnes Margaret Mucha (1893-1965).
 * The norm for many genealogy sites is to reduce ancestors to lists of tabular material, and much of this is driven my the way database software works. The obvious nexus between wikis and structured databases is Infoboxes, and that naturally has been the focus for microformats.  It also presents low hanging fruit for SMW, through use of the Semantic forms extenstion.  However, the core of SMW frees wikis from the tabular approach.  It is entirely natural for family members to present the story of their family as a story, and will prefer familypedia on that basis.  They may shun the tabular approach, but SMW entirely supports that.  Full narratives that happen to have structured data within them.
 * It may turn out that we want to suggest that everyone use an infobox for quality reasons (less chaotic look and feel). However, even in that circumstance we might have people putting some optional information inline so that the infobox is less cluttered.
 * Observation- this affects naming. From an engineering perspective, the data types are central.  Dates are a general type, with multiple forms but they are all dates.  Events simply are the variants eg date birth; date death; date wedding; date yada yada yada.  But from the user's perspective, the events are central, and the various details about them are the variants.  Birth date, birth county, birth state, birth people present, birth notes.  Maybe we keep the naming the same.  Today, 25 May, I think so.  Okay.  This would chuck the whole inversion thing.  It would be wedding1 date etc.  Hmmm.  I suppose the variant can go in the middle with not that much difficulty.  Code will look a little uglier, but what the heck.  Ease of inline naming is more important.  I just don't know that folks will do it that much, or that we want that to be become a dominant way of stating things.  Hmmm.  If the community decides they like inline,then we would be painted in a corner if they wanted to rename all the properties because that would be really tough.  So let's name assuming ease of inline use, and just accept the slightly greater complexity in the code.  Face it, no one touches esoteric templates anyway, and in the grand scheme of things this "complexity" is trivial for an experienced template writer.

FOAF RDF
I don't see any harm in keeping up to date with this specification, but it is way immature (they have first name as well as given name, surname as well as family name...), and makes some requirements (eg surname is a string, not a page) that we don't want to observe. However, we should keep up to date on these because this will relate us to the larger world, and we want to make sure our semantics of usage is as close to theirs unless there is a very good reason why not.

Descendants/ Ancestors encoding
General problem definition:: Exponentiation expansions are generic to this problem domain, so optimizations will be necessary regardless how strong our software engine is at any point in time. In general, we will be using local processes to offload this to the client machines in order to execute massive depth searches or network walking that would time out on the server.

As of the time of this writing, the Semantic mediawiki engine allows traversal of the tree of relationships, but after about the third generation (either from top down children tree walking, or bottom up reading of father mother links), the query response times become excessive.

Solutions:
 * 1) Ancestors: Cache the ancestors tree in a single string field for each person.  For example, cache 3 generations of ancestors using the string packing method I developed for Showinfo ancestors (actually, you could do 5 or eight generations too, but 3 seems like an ok place to start out).   This cache can be reset from the form, and is not reloaded every time.  This way, an ancestor tree could be loaded in lightning fashion because all you are doing is string operations not hitting the disk like a query would.  The way the cache reset button works is that you put a radio button on the form called change this setting to reset ancestors.  User clicks it, the value is passed to the template.  The template code executes, and compares the setting on or off to what the stored (#ask) setting is.  If they are different, do a refresh and update the property value so the next time the function is called, the values will be the same and so the refresh won't happen.  Simple.
 * 2) Descendants: Alternately, by storing the values as multiple value properties, it would be possible to do set operations on descendants or ancestors.  For example, to see if you are 2nd cousins with someone:
 * 3) * Assuming a number convention from parents as generation 1, then a 2nd cousin would share a generation 3 ancestor. Query for ancestors with ahnentafel number from 2^3 to 2^4 minus 1.  Compare with set from the second person, and you have the intersection with one query.  Pretty neat.
 * 4) * Why a mirror descendants tree? Eg:
 * 5) ** Inbreeding analysis: Compare the list of descendants for all brothers and sisters. eg descendants brother1 AND (descendants brother2 OR descendants sister1 or sister2, etc).  Any intersections are inbreeding candidates.
 * 6) ** Social networking: list all the known living descendants of a given individual living in your country, in your own city. Maybe much of this would be private names.  Ok- so list all those who died in the last 50 years.  You could do that by anding the death date with the descendants list.  Right?  You can AND a Page list with an N-ary list, right?  TBD- I need to test that one to see if they implemented it.... 

20 May Observations:
 * The ancestors list as an N-ary tree is probably what AMK and rtol need for analysis. For exhaustive reports, they probably also need it expressed as a string since it would be able to process very deep trees.
 * It definitely would be handy to have an external process walk these trees to compile these lists, since they can visit different articles and independently update them. Templates would be very challenged to do that.  AWB is looking more and more necessary.

Test observations

 * Wow. Look at the expansions- this caching could be expensive.  Adrianus Korver (1788-1846) has 7 children, and Pieter Korver (1817-1870) his son has 6.  So if that's an average, then Andrianus has 42 records and that is just for detecting marrying cousins.  If the pattern holds then you have possibly 246 descendants in gen3, and over a thousand in gen4.  Let's assume that storing is not a problem- Worst case for gen3 you add say 1K descendants times the 20 bytes for a page name, so 20K per person article, times 100K people is 20 gig.  (times 20 cents/gig = $4).
 * Ok, how about processing. You could hack up some things to do this for a few generations deep but ultimately you come up against some nasty limits so you really don't have a uniform solution with any kind of future.  One tractable way of going about this is to do it externally via AWB and cache intermediate results at the client side, then store it back in a descendants or ancestors fields either using template style, or using explicit double colon style. OTOH, you are tied to a tool that not many people will be able to run, and has an indeterminate future.  Maybe an iterative cascading template approach would work, where each template just does a little bit of the problem- say just goes to 2nd generation of processing, then a 2nd pass picks up the results of the first pass and concatenates them, and so on...

Walk through 1

 * Gen0 Adam
 * Gen1 Adam 2; Eve 3; child's ancestors: = 2^1 (+1)
 * Gen2 Adam 4; Eve 5; Fred 6; Ethyl 7; Darren 2; Samantha 3 ancestors: take Gen1, father side and add 2, mother side add 4.
 * Gen3 Adam 8; eve 9 Fred 10; Ethyl 11; Derwood 4; Samantha 5  Formula:
 * Gen3 (adam on mother's side:) Adam 12 Eve 13 Fred 14 Ethyl 15.
 * If ahn <4 add 2^1 if male, +2^2 if female
 * If ahn <8 add 2^2 if male, +2^3 if female
 * If ahn <2^4 add 2^3 if male, +2^4 if female
 * Refined: note the centrality of the generation#. The general rule is you always add 2^generation number if the tree is from the father, and add 2^(generation number+1) if from the mother. So how do we derive it. Ok finding the exponent of 2 is a log base 2 transform.  #expr has logN (ln), so we can do this.  To get log2 from ln, multiply 1/ln(2) times the ln of the ahn# and you have the log base 2.  ln(2) is about 1.44 so we just multiply with that and round down to the nearest integer.  That's your generation number (exp2base)
 * Formula: exp2base= floor( ln(AHN)*1.44 ) btw- floor means round down to the greatest integer less than x
 * ahn=1734, generation =
 * adding 1734 from a father's tree becomes
 * adding 1734 from a mother's tree becomes
 * calculation:  
 * this power of 2 "exp2base" number is just another word for Generation #.

Volume test observations

 * Nothing seems to break. Very large Ahnen values can be stored, and very large numbers of ancestors can be crammed in the Ahnen field.
 * Querying Property:Ahnentafel (currently it is a many valued property) is problematic.
 * We'd like to be able to treat the N-ary property as if it were a normal query result. For instance we would like to process a list sorted by article name, or by Ahn value.  Well, you can't do either as far as I can see.  If display is all you want, then no problem- chop it up and put it in a table with class sortable.

Duplicates
But if you want to cull duplicates, you need to do successive string searches. If folks wanted to allow multiples eg the person is both their great grandmother on their father's side, but great great grandmother on their mother's side, then you'd do it this way. IMHO that is a rare kind of demand, and it is more desirable to keep the list as compact as possible and cull any duplicates at save time. So as we are adding, we just take each mother side ancestor and do a #pos on the father side ahnentafel list. If there is a hit, we don't add.
 * See interjection at http://genealogy.wikia.com/wiki/User_talk:Phlox/2009_05_notes — Robin Patterson (Talk) 05:12, 26 May 2009 (UTC)

Phlox Mining operations

 * Currently the good doctor is involved in paleontological expeditions in the UK, India, Oz, and the USA. The goal is to extract the placename semantics that familypedia needs.

SMW enhanced location templates

 * coor dm, and dms were modified to output Property::Coord probably.  It does not know if these are single coords (which in most cases they are, and indicate the location of the placename that is the subject of the article), or whether they are on a page with multiple coors- eg a list of mountain locations.
 * coor title dm and dms were modified to output Property:Coord, since this template is for indicating the location of the subject of the article.

Geographic infoboxes
The goal of the modification of these infoboxes is to extract semantic placename information.
 * Extract values that correspond to our country-state-county-locality hierarchy
 * Map to Properties Property:locality of county, Property:county of subdivision and so on.
 * The purpose of this is for querying. EG: the Gedcom says country X.  I have a county name and what looks like a town name but they could be one of many different places.  What is the valid set of counties for country X?  What are the valid set of localities for County Y?
 * Create a clean category tree for use with Autocompletion feature in Forms. EG: Category:Valid name for county of Georgia (U.S. state).  The category structure was created in an ad hoc way by wikipedians and was not intended as a disciplined structure for database error checking purposes.   The Valid name structure has these rules:
 * Names of places in the category structure correspond directly to disambiguated article names eg: Georgia (U.S. state)
 * The category structure is restricted for naming use. No ancillary information such as subcategory Maps of County X.  That stuff goes in the normal categories and are inteneded for this purpose (Category:Counties of California) etc.
 * The tree structure is uniform and globally applied, following the country-subdivision-county-locality naming convention Note subdivision was substituted for state and has the same generalized meaning as Property:State.  Variation in naming of the categories (for example for localization) should not be necessary since these are invisible categories.  Such variation will not be permitted until we are sure that Autocompletion and error checking functionality will not be impared.  That is the primary mission of these structures, and if people imagine other uses that stand in the way of that goal, then that is fine, but they should implement such features in a different category tree.

Discussion on extractionOur genealogy structure is simple and has to do with the places we know from documents. Country-country primary subdivision (Province/State/Canton)- "Localities" (village to City) and the entity in between these last two, usually corresponding to "County".
 * UK
 * Template:Infobox_UK_place (Some of these indicate Chapman codes)  The structure of these are complex, and for the "Counties" field especially, it will be a matter for considerable discussion how which UK governmental entities recorded in the infoboxes are most useful to genealogists.  Those interested in the UK may wish to introduce specialized templates and forms so that a richer set of placenames may be searched on exactly as all other global placenames are searched on.  This is tricky but possible using SMW, but I have no intention of spending time on this special case.  We need a generalized structure that works for 85% of the cases, and we probably have that with country-state-county-locality.
 * England
 * Template:Infobox England traditional county
 * Template:Infobox England county
 * Scotland
 * Template:Infobox Scotland county;
 * Template:Infobox Scotland council area;
 * USA: Template:Infobox U.S. County;
 * Global use: Template:Infobox Settlement;
 * Germany
 * Template:Infobox District DE;
 * Template:Infobox German Bundesland;
 * OZ:
 * Template:Infobox Australian Place;
 * Template:Infobox Australian cadastral;
 * India: Template:Infobox Indian Jurisdiction

Generalization
Clearly, we need some facility for refreshing some content from Wikipedia. In particular, during the last year there has been considerable activity geocoding places with highly accurate coordinates. We normal way to refresh content without destroying work that Familypedian contributors have added. It seems to me that the information that we really require to be refreshed in an automated way is the semantic information we are extracting.

The place articles need to do a few things
 * Extract SMW information
 * Provide a contact bulletin board for local resources of information like other genealogy sites. This function is now filled by the county navboxes.
 * Allow Contributors to provide Microhistorical accounts that relate particular experiences of their ancestors of these placenames during different time periods. Local photos, history of various enterprises, etc.  Should be included as headline material along with general historical information
 * Historical and biographical information should be prominent in a place article.
 * Wikipedia content may have some value, but should figure less prominently.

Current thinking is that the infobox data from the wikipedia article would be split off into subpages that are transcluded into the main page. These are not intended to be navigated to separately as we do with /biography and /ahnentafel subpages. One subpage would be SMW specific. Another would have Wikipedia content that is locked because any edits to it will be lost when it is refreshed next. The main page has whatever the contributors want to put for that location. This scheme allows a maximum of contributor editing flexibility, while keeping our SMW and Wikipedia reused content fresh using automated tools.

Multilingual vs Multiple language
For visitors to take advantage of the multilingual (Mediawiki message) capability of tables, they have to be logged in, which is too high a barrier. One idea was to use subpages, but many place names are not similar to each other. How does a user find "Den Haag" as "The Hague/Den Haag". Seems like they should be top level strings. However, thats a lot of names to crowd into the same namespace, and their will be collisions so what we do is postpend the language code. That means we can tell Den Haag (nl) from Den Haag (nl). So the naming standard shall be:
 * Names are as determined by English wikipedia.
 * Familypedia has an article named identically, and it stores the universal properties for the place: the translations to all languages, the coord, containment hierarchy, etc. The other languages must not store this data redundantly.  They will store information local to that language, but properties concerning information that is true across all languages (coordinates etc) are in the english version.
 * Non english versions must postpend their language code to the end of the articles. Placenames must be spelled exactly as they are in the wikipedia for that language.
 * This central (english) version can easily be found by searching for the current pagename in the translated articles field. The only article where the page will appear in the english article that has that page registered in its the language version property.
 * Templates have versions with the same convention of postpended language code eg nav place nav place (nl). Text is free of the constraints of multilingual messages.  Users just translate the strings as they please and reformat tables as necessary.
 * A small text language tab bar is presented along the top of an article if any alternate language articles are available.