User:Phlox/2009 05 notes

11 May

 * Form junk
 * do extended form
 * preloads
 * short descr - set wide 100%
 * article rating
 * gallery section derived from images
 * building street line too big
 * convert Form long over to international strings
 * propagate updates to the subforms
 * need the international strings fleshed out more completely.
 * #if gallery, br clear all before gallery
 * general sources- where to put the footnote?
 * boilerplate text for empty (new article)
 * siblings- maparray break this field
 * Form for image fields
 * Move over international to mediawiki namespace

random accumulating tasks

 * could set up source field as pairs, the first is the cite parameter name, the second is the value. eg cite|title=foo|author=bar = title~foo~author~bar
 * set general info missing properties:
 * fame 1-10, "notability", so that one could concievably find notable ancestors.
 * female line mechanism? eg- the eve line thing.
 * images, documents, other files.
 * tags
 * run a bot to propagate all event properties uniformly
 * other languages Prototype (a long form?)  Is a lang just a different kind of version page?
 * other stuff
 * multimedia
 * wikipedia references
 * descriptions in alternate languages
 * wedding2-6
 * occupations1-4?
 * education1-4?
 * emigration? or immigration?
 * residences? or mixed with Census event subpage?

junk these
general info, sources

wikia/ semantic engineer info requests

 * "$smwgInlineErrors = false;" so no type errors on blank fields
 * Are there metrics for measuring server load for a page, eg: NewPP limit report?
 * How do I check what the list of environment variables ($smwgInlineErrors) are?
 * What is the expansion limit for client side autocomplete? The forms docs say the default is 1000.  Is that 1000 strings for each field, or 1000 for the entire page?
 * Bug autocomplete (top of screen)?
 * Is there a standard way to stick wikitext in a text field?
 * Images stuck in properties of type page attempt display. Isn't there some way to indicate the right hand operands?  EG image size, no display (prefix with :), etc?  Currently, the default is |frameless|border|text-top]].

random bugs

 * templates that mess up wiki banner
 * template:documentation
 * navbox (probably tbar)

Model

 * approach1: stay with the subpages thing, (this works because move takes all subpages).
 * In place of a single /info page, there are multiple /theory or /version pages. These correspond to an alternate theory of parentage for controversial individuals.  They also more commonly correspond to alternate Gedcom records, many of which are for the most part redundant repetitions of the same material.  Merging them becomes problematic since they may have been embelished on with some useful information prior to being re published with a new unique gedcom identifier.
 * To represent a gedcom file, a version would have links directly to the /version page of children, wife etc. Later, someone with a genealogy bot tool would be able to make assisted fixups of these records, collapsing them as possible.  The gedcom ID, and possibly a  of the key features of the file would be stored so that further copies would not be redundantly collected  over and over.
 * Theory files would be directly queried instead of the main article page.
 * Every complex data item becomes a subpage, eg: events, residences, marriages, children by marriage (?? actually, it may best be an #ask for Father=X, Mother=Y, but it is unclear- their are complexities to doing it this way- see issues section.) Note: this will generate a lot of clutter for an article.  No better way to do it because we don't have aggregated objects (effectively, there are no n-ary relations).
 * Actually, there are n-ary objects "many-valued properties". However, the current implementation has some limitations that make them unnacceptable:
 * You cannot use the special Allows value property to limit the values of any element of a many-valued property.
 * You cannot use the special Display units property to control how a specific element appears.
 * You cannot set the layout of the values; they will always appear as a comma-separated list.
 * You cannot create a timeline query of many-valued properties.


 * variant to 1: All complex objects are subpages, but instead of putting the fields on an info page, put them directly on the main page.
 * objection- this disallows the alternate theory approach, where each /theoryN page is switchable by preference.
 * approach 2: KISS (keep it simple, stupid). The main page carries the properties for the individual. These are the ones accessed
 * These may or may not be copied from the theory/ versions subpages but in any case the field names would be identical.
 * Note, the main page is also the english version. Issue for multilingualism?
 * It doesn't really matter where the properties are stored if they are accessed with a get function. The get function could access what the dominant theory is etc.  But also the theory/ version could be explicitly stated.
 * At first glance this is the most untidy engineering approach. Much nicer if each aggregate data item is stored in a separate page, then common fields like date are not redundantly replicated (birth date, death date, Occupation1 date,...) but are instead are a generic date property of a birth, death, or occupation subpage.
 * Problem 1: canonically correct object orientation introduces complexity for template and tool writers, and this is not good given that many are enthusiast/hobbyists. Even simple queries are complex, for instance they will have to do indirection to get an any value.  Eg: (#ask (#ask for the birth page name) property from the birth desired)
 * Problem 2: bot tools will find it much easier to deal with simple data models (one page per individual), rather than have to do IO to get at a potentially huge list of separate pages.
 * Problem 3: Competing theories on different facts is an additional dimension of permutations. If our model is a separate subpage for the death event, then what happens when we have an alternate theory of the death event?  Do we create a theory subpage of that death event supbage?  Whoa.
 * Problem 4: Our model for multilingual is to store unique items in a /(lang prefix) subpage. This means that each of the event subpages would also have to have a language subpage.  Not pretty.

Issues

 * No watchlist notification Too much dynamic updating could present some undesirable side affects. For example, one way of doing children is to not have the parent article declare the children and the children redundantly declare the parent. Instead, have the declaration be in a single place- from the child article.  That way you can query for has father and has mother and get the list for one set of children by one mother, and so on for each mother.  The problem is that the article changes with no watchlist notification, so the user may not be aware that cherished articles are being changed.
 * Interaction with theories. Given the child parent model example above, does the child link to the basepage, or to one of the theory subpages?  If theory page, what if some are dissimilar in unimportant respects, and the child is of theory 2, 5 and 7 versions, but not the others?
 * Maybe it is a good idea to redundantly link these, and fix inconsistencies via manually operated bot. EG: Father theory1 declares who it thinks the children are for each union, and theory2 declares another list.  From the child side, they declare the father BASEPAGENAME from each theory page, enumerating any of the father theories that are excepted.
 * Social factors play into this- you want to have a site that is welcoming, but if you were ever in a bar with military guys, folks will quickly come to blows about whether one guy is being a phoney and claiming he was in a unit or participated in some action that he was nowhere near. So you have all these different versions from different perspectives.  You have the Collective event stored in the Event: pseudo space, then you have all these assertions coming in from all these different articles asserting they were there.  The military history buffs maintain the event integrity by listing the alternate theories etc, but have no interest in mediating these disputes.  So the article declares the dominant theory of who it things were at the site, but also does an #ask for all the person Theory pages that claim the person was there.  A bot shows where there are inconsistencies that the contributor may or may not want to resolve.  The point is we make the addition of content relatively painless.  Resolving such problems can be deferred if we have a good manually assisted edit tool.
 * The alternative is to have everything connected, and the military article only knows who participated by the soldiers who assert it. Similarly, the guy interested in just adding the article on his grandfather is not required to create a military battle article with all the particulars of that.  He doesn't want to declare the battle article  just to record the information he wants in his granfather's article.  If everything is connected, a lot of this work can't be deferred.  In addition, the newbie could be subjected to all sorts of community pressure that he is mucking up articles on collective events (military battle) with unfounded assertions (that his great grandfather single handedly took the hill and saved his unit from certain annihilation).
 * Coordination with RDF- we will use accepted genealogy RDF and other data model ontologies where possible, but our focus is not on solving the database heterogeneity problems in the genealogy community. It's a very hard problem generic to all databases.
 * Background: accurate mapping of fields in databases is a hard problem that has not been solved even in where the database is used by the same company, using the same software with the same syntax and the same schemas. For instance, the interpretation of the fields is oftentimes dissimilar between operating groups and so analysis is impossible because apples to apples comparisons are impossible- eg: what is/ is not included in net revenue figures?  One operating takes some operating expenses out of their net, others exclude these costs in order to inflate their apparent success in generating revenue.  The field is the same, the entire record is the same, the software is the same, yet they cannot be meaningfully compared.

Switchable views

 * Scope of investigation: Limited. Investigate this only to the extent of determining that we are not going to paint ourselves into a corner for future options.  Find out how I would likely do it in the future.
 * The solution probably has to do with the way we do user preferences, and so the only way we have of doing that sort of thing is through css and .js - like how we do the date formatting thing. We set a state, the css sets the formatting based on the state.
 * Basic technical field of battle challenge: Due to sever loading issues, you can't affect template code or otherwise generate custom articles per user. One way is to generate all views, and unhide the one that the user prefers.  If the user disagrees with the dominant theory, they put in their preference of theory subpage as a property of the basepagename. The value is expressed in the html, and the client side .js code turns on or off the rendering of the data depending on which theory is preferred.  That is one way of doing it.
 * The template emits span display none for the non dominant theories. So folks with not logged in/ without accounts see the dominant view.  classes are assigned to these spans so they alternately have the display non overridden, and the dominant spans hidden.  This is the work of the .js by consulting a hidden list of which theories are preferred by which users/ club names.  Your preferences might state- show me the dominant view except in cases where this list of people think it should be something else: User1; My User; Project name; local genealogy group name;  These groups might arbitrate among themselves what the more correct theory is.  So subgroup collaboration can express an effective minority position.

Multilingual
The forms code has some autotranslate thing that I didn't read up on. Have no idea what they are doing, but in any case, our problem is bigger, because now we also must be not just with respect to the user's theory preference, but their language preference. Whoo boy.
 * approach1: Main page alternate languages can be base pages (eg for alternate fonts- chinese, greek, cyrillic), but they point to the subpages of the english BASEPAGE, using the language prefix as a subpage eg /fr. Example info page approach for the obama article.

RDF Ontologies

 * GEDCOM RDF mapping

SMW stuff that doesn't work

 * Text input allows text values from a form that will mess up display of the page
 * No way to input wikitext into a field.
 * Hackaround: It would be possible to hack set values eg: ((wp>USS Monitor))could be parsed and could be displayed.
 * The docs say the following doesn't work, so don't use examples from some sites that may use them:
 * Inverse properties, eg siblings
 * Domain and range restrictions, eg Father
 * Number restrictions and functional properties
 * Transitivity (?)
 * date type requires full date. This is lame because oftentimes we have year only, year month, or circa type dates. (hackaround is to offer year, month boxes for partial.)
 * Forms don't support all table options. eg. background color style, see Form:Demo1, the place subbox should be background light green.  Possible hackaround is to use html td code, and set the css fieldset and legend backgrounds to transparent.  Fix is to look at what Yaron is doing in the php, but I won't get to that for ages, and this is sort of cosmetic stuff anyway.  There are complex browser issues since IE apparently does things differently and special case code is necessary (what a shock). For instance here.

Missing special pages

 * No upload ontology
 * no upload vocabulary

Bug list

 * Bug list


 * Nary does work? Mentions of this in Bug11411 can't have an n-ary relation composed of an enumerated type

Bugs I ran into

 * complex layout causes form to "forget" field attributes EG: Partial form set death does not pick up default values for death date-approx or calendar. Works fine on main form.  Image width has correct values set in property, but if it does not redeclare it's property, it will not display the pulldown list.
 * workaround: declare property= on the field statements
 * declare does not work if there are spaces after (perhaps before) the equal sign.

PHP fatal error in /usr/wikia/source/releases_200905.1/extensions/wikia/WikiaSpecialUploadInfo/WikiaSpecialUploadInfo.php line 19: Call to a member function getTitle on a non-object
 * date displays time 0:00:00 on second, third, forth.. Forms when editing with Form:person long form. If the page is instead edited with form:set death or form:set wedding1, then the value is set properly.
 * workaround: clip the minutes seconds using the #time function in the set templates. This is a temporary patch.  #time needs a lot of fixup to handle single digit years etc.  The real solution it so fix whatever the extension is doing.  May be  a bug, or caused by it getting confused by some of the table formatting stuff I put in the form to make it look less voluminous.
 * Upload bug when loading geer jpg

Limitations & workarounds
'The following section is obsolete after the discovery of how to use stringfunctions to crack the page value so that a string may be extracted. See "Major discovery" below.'
 * As of the time of this writing, for autocompletion to work, it needs pages, not strings. Setting autocompletion on property= some page property works fine, but not with some string property.  Properties do not pick up subproperties, so it seems to me that categories is the best way to go about this- otherwise you have a huge flat namespace that you have no option of segmenting in the future (eg just fill in with counties in Scotland).
 * Autocompletion is fine locally when there are a small number of values. With large numbers, the page load can be very slow, and you may run into a 1000 item max (not sure this is per page or per field).  Remote autocompletion has no limits, the page loads faster, but the autocomplete is slower.
 * workaround plan: autocompletion on category|remote autocompletion.
 * As of the time of this writing, autocompletion box displays at the upper right corner of the page. It is hidden if the page is scrolled.  This is very bad on a large page.
 * workaround: position all autocompletion fields on the first screen full of data.
 * Do not place free text box at top, move it to end of article. This will make regular wikitext reading of the article disconcerting for typical WP users.
 * recommend users go to smaller forms for most of their editing.
 * If a property is a page, then it can't be used for conditional ifs. If you need it to be a page, to get it's value, you have to store it as a string property. This makes code complicated.  For example, "La Salle county (Texas)" can't be used in a line that displays city, county, state, because you duplicate texas.  So you need to do a La Salle county .  You can make the short name a property of that article, but if the property is a page, you can't do the decoration with square brackets.  Same problem with displaying surname.  The article name as (surname) postpended.  This shouldn't be displayed, but to strip it, you need access to the string value.
 * Never store articles as a pages . Always use strings and use square brackets to display as a link.
 * Counter argument: Properties of type page allow redlinks to go to a default form input. We are doing people that way, so maybe we should do everything that way, and just use the & template to do the dereferencing.

Weird/ anomalous stuff

 * returns lots of pages. Does it recognize pagename as a key word? or is it undefined when first operand is null?
 * form weird syntax:  in page http://discoursedb.org/w/index.php?title=Form:Author&action=edit
 * "author" is the feeding template name for the form, and first name/ last name are parameters from it.
 * Gotchas
 * if you forget to insert the = after the property name, it will print out the field before the value:
 * If you put a link in a text field, (eg: [http:blah blah] ), the property will not be stored, and subsequent #ask's won't work. This may be true if there is any wikitext in the field.  The template page will display the property set statement, since it is not executed.  Solution may be to urlencode all text fields, then unencode them.
 * When you save a page that has properties, it nukes all previously saved properties for the page. So if you remove a line that set the property, the property won't exist anymore when the page is re-saved.  So on every save, every set property statement must execute with prior (or new) values, or they go away.  Weird.
 * If you put a link in a text field, (eg: [http:blah blah] ), the property will not be stored, and subsequent #ask's won't work. This may be true if there is any wikitext in the field.  The template page will display the property set statement, since it is not executed.  Solution may be to urlencode all text fields, then unencode them.
 * When you save a page that has properties, it nukes all previously saved properties for the page. So if you remove a line that set the property, the property won't exist anymore when the page is re-saved.  So on every save, every set property statement must execute with prior (or new) values, or they go away.  Weird.

Cool stuff

 * Timeline output (for all subprops of event date)

Complex types

 * Marriage eg. Edward Riggs (1589)/Holmes-Riggs
 * residences (big because of censuses)
 * All other events with multiples: Occupations, Education
 * Migration event emigration: (from country, ship, ports), moving: mode of transport, reason, motivation
 * Citations? whoa.  That certainly is complex, but will the user have to create a separate page for every one of these jokers?

Small pages
Something goes against the grain about having all these tiny pages sitting around. The tough thing about these events naming them.
 * 1) They aren't owned by one person- eg the husband. What if it turns out that the name of the person gets changed etc. Do they then have to move 20 of these bittey pages every time they rename?  Hmmm.  Maybe a bot can do this.  And who owns them?  Does the Husband own the marriage event or the wife?  If the naming uses both people's names aren't you just doubling the chances of move due to a changed birth or death date?  OK, maybe these events are owned by one of the parties- doesn't matter who- it's just a unique name.

Implementation

 * Opening via form on redlink- To property "occupation"(s), add value has default form.
 * ??pagename should be a subpage, but user has entered the page name. So they could muck it up. OK.  Maybe they enter a string, then save the form.  On next form load, the template doesn't display the string, and instead creates the real field of type Page, decorating it with the proper prefix.
 * Note that for shared events like marriage or migration, say the husband already created a marriage page. Wife needs to poll the husband to see if a page already exists and link to that first.
 * Maybe there is a smarter form way of doing this.

Pro/Con

 * Pro1: Removes a tremendous amount of clutter from main page.
 * Pro2: Tidies the Property namespace (no occupation8-locality, etc.)
 * Con1:This could potentially disrupt workflow. You have to open a new page to type in marriage info.


 * Conclusion1: Being a subpage doesn't imply anything. It has to go somewhere, and being top level means it will collide with other similar names.  So keep these complex types as a subpage, use some logical naming convention, and call if folks want to get fancier later, they may do so.

Tips

 * Major discovery- you can use string functions on a returned page, and it turns out that it is just wikitext for a link with bar and right hand value the same as the left. So you basically divide the number in half, minus the decoration characters, and you have the substring offset.
 * returns:
 * This means we can use pages with a great deal more freedom. They are never opaque, even redlinks.
 * Namespaces not 0 For pages not in namespace 0 the string is at offset 2, not offset 3 as for ns:0. I don't know why you'd ever want to use type page for an image, because it tries to display it. This might make sense for a wiki with images that are already the desired display size, but they seldom are.  Examining the decoration, I see that they are constrained, so this is ok to use.  The decoration currently is:  |frameless|border|text-top]].


 * Subproperties means you can search for groups. EG if birth date, death date, marriage date are all subprops of event date, then searching on event date picks up hits on all of them.
 * Forms have a preload option
 * There are a set and declare statements:
 * default form for a namespace is set at genealogy:File, genealogy:main and so on...

gedcom

 * very elaborate gedcom origin record
 * with _UID, !LINKS, !OCCUPATION: embedded GEDCOM field names
 * Raw gedcom with sources, evidence/ event types, indirection examples

DNA / YSTR / Haplogroup

 * Family tree projects
 * - Trout

Development sequence

 * First work used subpages to store generic properties. Form:SMW-test3 created an event subpage eg George Spencer Geer (1836)/death
 * Since such compound structs are not possible from the main page, this was the best "clean" approach for data aggregates eg death.date, death.state etc. Although it is best practice from an engineering perspective, and there would have been an economy of property names, it was abandoned for many reasons not the least of which it would have been more difficult for novices to access the values.  The approach inherently requires indirection, and therefore code that references data must do a couple #ask's.  Further discussion may be found above in the Model issues section.
 * Main page forms and partial forms were shown to be able to edit a normal article, with large volumes of properties. Forms were shown to be reformattable for more attractive UIs .  Partial forms were shown to be able to produce bite sized chunks so that the user is not overwhelmed.  New article form "short form" was demo'd for simplicity of initial article creation.
 * Double flushing emerged as a problem. Solution might be to merge the set variables templates into the infobox display.  eg:
 * infobox header,
 * set birth cell template
 * set baptism cell template
 * set death cell template
 * This design would eliminate double flushing, but for items displayed elsewhere on the page (eg gallery section with images from birth, baptism, weddings)- these would require a double flush.
 * Re-ordering: editing with partial templates will reorder the sequence of the infobox cells. EG using form:set death would place the death cell prior to the birth cell.  Bummer.
 * Alternatively, one could tie the setting of elements directly to the display of those elements. So if there were a gallery section, Wedding1 photos would not be set with the wedding1 dates and location, but in the set gallery template.  If everyone agreed to a standard layout, then this might be workable.  From the researcher standpoint, it makes data input workflow more haphazard, since it would not be possible given the current code to display the wedding1 data together, since they would occur in different templates.
 * As of 10 May, the idea is to tolerate the double flushing at least until the reordering problem is addressed.
 * "Everything" Form (Form:Person long form) cannot be burdened with rare and voluminous items (eg occupations 1-8, Military events 1-8, Weddings 1-8, Residences 1-8 and so on. The idea is that for the main events, indicate the first event on the main form and to use partial forms for the overflow.  Less common events like Bar Mitzfah, sealing, adult baptism etc won't have any mention on the main form.
 * can forms be launched from forms?