Forum:Initial Info page conversions

I have converted some articles over via Bot from Info pages to Facts pages. These were mostly articles that I created in 2007 corresponding to the family trees of Obama and the former US vice president Cheney. The entire list of converted articles is in hidden category category:Upgraded from info page

Since AnimeBill bravely volunteered to have his articles converted, I shall proceed with the trial by converting his after converting non wikipedia source info pages that I created. We'll see where we stand after this initial trial. If anyone wants to volunteer articles they created, please indicate this below. Mine and animebill's will keep me busy for a while, but after the kinks are worked out, conversions will go faster.

No data is discarded, and simply reverting the article will restore it to its former info page state.

Some notes on what to expect for various fields. Regardless what the following states, I'd like to hear about anything that it seems reasonable that a Bot should be able to convert that it is not converting. There was a large degree of variation in how people used Info fields. If anyone diverged from standard patterns, the pattern recognition algorithms won't detect them. It is oftentimes a simple matter to correct this, but I need to see examples of the cases to fix it. In some cases the variation is so rare that it takes far less time to just manually adjust them so they are recognized.


 * Dates: For all events, dates encoded in date should be encoded in all cases perfectly. recognize 3 different layouts of date (ISO Y-M-D, D M Y, and US/Canada M D, Y).  Spelled out months, and 3 letter month abbreviations are recognized.  In most cases the separators allowed are slashes, dashes or spaces, optional comma after day.  If this converter seems to handle cases adequately, it is a simple matter to create a version that handles 3 digit dates.  For now, I am skipping BC and 2 digit dates.  When we know how many there are and how they are formatted, a converter can be created.


 * Places: Place names that were categorized (town, county, state) are moved to the corresponding facts property. "Event place" fields (Birth place, Death place..) were moved to event_places-other.  "Event" fields (Baptism, Marriage..) assign to places-other text that is not recognized as a date.  No attempt is made to assign non categorized places to fields (eg places with "county" in their name).  Items going into a places-other field are put in a separate subfield (using a +) if a comma is seen.  Commas inside of square brackets are excepted.  Square bracket links are removed from all places (as well as all other parameters of type "Page").


 * Source fields contain formatting such as bullets. This should remain as originally encoded.  No attempt was made to separate them into distinct fields.


 * Short name: *Short name was equal to the page name in most info pages. This was overridden to given name + surname, which is not correct in some instances, but is more correct than the basepagename, or leaving it blank (since it is used as the title of the infobox.

Technical notes
 * The way this was implemented was to subst/transclude the Info page directly into an includeonly protected portion of the tree subpage. The info fields were then manipulated by AWB regex expressions to massage it into facts format in a single pass.  The #time funtion was used for some of the date conversions.

Any questions, let me know. 19:51, November 3, 2009 (UTC)