Forum:SMW

'''This page is intended to be the main question and discussion forum for Genealogy:Semantic MediaWiki (which has shortcut link: "SMW"). It is starting with copies of extracts from existing discussions. Please feel free to start new sections about specific pages on their talk pages, but please come here for at least a link back if the discussion broadens at all or if it might interest other contributors.

It is preferable not to have user-talk pages containing more than the briefest of question-and-answer dialog(ue) on SMW. If your question could interest two or more contributors, please put it here (with heading and edit summary both meaningful) then use user talk pages merely to tell your respondent(s) that the question is here.'''

Main information pages (list to grow as their number does):
 * Help:Semantic MediaWiki
 * Query (redirect)
 * Unused properties
 * Property talk pages
 * Type talk pages
 * Form talk pages
 * Concept talk pages

Requested extensions
I requested that wikia central dudes enable the following extension on genealogy. It is used by several other wikia including psychology:

Semantic Mediawiki.

They may ask you if it is ok. This may allow us to have forms that would help ordinary people access info pages. We will need it enabled to see if this is so or not. It also allows semantic searches, such as searching for a person located in a particular geographic area. Somewhat useful for us, no?

I also requested them to look into supporting extension "link attributes"- this is useful for microformats- allowing us theoretically to interact with other internet applications. Very experimental, but something may come of it. - ~  Ph l o x  01:08, 30 April 2009 (UTC)


 * Sounds like progress. See File:Smw-quick-reference.png. — Robin Patterson (Talk) 04:17, 30 April 2009 (UTC)


 * I played with it for a few hours and I couldn't break it very much.  More here.  Robin, I am very excited.  I was writing about the semantic wiki stuff a while back but you know, this may well be ready for prime time.  I am cautiously optimistic that this can be moved over to in the coming months.  If folks agree that we want to make the jump, I'll do the bot runs and help folks convert templates.  There may be a few hiccups but well worth it.  Worst case, we revert everything and we are back to normal.  - ~  Ph l o x   08:56, 30 April 2009 (UTC)

Properties
I see you are creating properties. Very good. I guess these can be used to create couples. For instance, shows people who married their first cousin, but not who they married. Should really be displayed as Arie Korver x Anna Korver. If Arie has the property "married cousin" this can be done. rtol 10:39, 1 May 2009 (UTC)


 * You would be able to do this dynamically via a query. It would be a single #ask, so for all practical purposes it would look like a property anyway.  I left some breadcrumb links over the help forum in future of info pages thread.  There are some example wikis that have used it and some good examples/discussions.


 * In db design the rule is not to redundantly code something that can be derived (easily). To do otherwise introduces database inconsistencies.  With that said, wikis are fundamentally unstructured databases.  Anyway, that is my first reaction to the cousin married thing.  We could also hard code stuff like property lived in New Hampshire.  I rather think that "had residence" whose property was "has location in state" is a more proper way to encode this kind of structure.


 * Everything I am doing now is very makeshift, so don't assume too much about what is happening with it. I haven't formulated strong opinions yet.  I plan to be doing a few approaches and then will release a tool so that everyone can convert over stuff as they please.  I am mostly interested that we do this right, and that our decisions will establish familypedia as the premier semantic genealogy site.  Everyone else is playing the quantity game of piling on gedcoms.  We need to play with different rules if we are to dominate.  Our strength is collaboration and quality- that will in the end crush everyone.  So I am of the opinion that we should move towards having a very friendly version of a hard core evidence based system that operates from the notion that genealogies represent multiple and often conflicting believed realities.  All of our so called facts are really just assertions with varying degrees of certainty.  To add a formal semantic layer over fundamentally contradictory knowledge is just GIGO (garbage in, garbage out).  Anyway, that is my thinking as of today.  - ~  Ph l o x   16:20, 1 May 2009 (UTC)


 * Got it, and look forward to these enhancements. rtol 18:26, 1 May 2009 (UTC)

Property names
I know that it is early days, but I have a suggestion about "Property names": there is an asymmetry in using "Spouse", "Spouse2", "Spouse3" etc which complicates the templates. I would prefer the series to be "Spouse1", "Spouse2", "Spouse3" etc, and so on for the various "multivalued" properties (eg AFN). Thurstan 04:18, 2 May 2009 (UTC)
 * Yeah, I did that ordinal numbering thing in a clumsy way. I am on it. - ~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   05:31, 2 May 2009 (UTC)

How the new thing works
This article employs the SMW extensions for all input George Spencer Geer (1836). I don't have support for all the fields yet, but this will give you an idea of the kind of UI that is possible. Note the "edit with form" on the toolbar. Click that and you can edit all the fields known to man for that person. At the bottom of the page, I have a collapsed edit box that allows the user to edit just selected items (like if they were only researching baptism records, they'd click baptism in that box. When an article is first created, the user is not bombarded with every choice possible.  Instead I give them a pared down list.  Click on the wife or children redlinks to see an example of that.  At the bottom of the page you will see a Facts list with the equivalent of all the info page fields displayed.  This can be turned off if desired on a per page basis.

As I said, not done yet by any stretch of the imagination, but this gives the general idea of how this mechanism works. It's a little bit of voodoo, but the docs on the semantic mediawiki site will explain what I am up to - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  10:16, 9 May 2009 (UTC)

Pioneer George of New York
Page looks impressive, to say the least. I see no toolbar as described, but I saw the thing that opens a can of yummy spaghetti.

I've posted his page to my profile on Facebook, with the following introduction:
 * Semantic MediaWiki is moving Familypedia up a gear! (This page is a "Beta" version but pretty impressive.) Swiss ancestry certainly has its good points.

— Robin Patterson (Talk) 11:01, 9 May 2009 (UTC)
 * By edit bar, I meant the wikia bar that says edit this page, leave a message, history, etc. Before the "Edit this page" is the Edit with forms, so you obviously found it.


 * The thing I found was a show/hide thing about half-way down. I'll try top left next time. — Robin Patterson (Talk) 13:48, 10 May 2009 (UTC)


 * Editing with this long form is probably not the preferred mode for users. Perhaps they will learn to click just on the bit that they want to change, like birth, wedding, children, and what not.


 * Perhaps you did not see, but on each of the events is a place for images. Adding an item here guides the user through an upload, and the resulting file name is stored with the article for the contributor.  Further, the size of the image can be set by default, so they don't have to know anything about how to jam it into an infobox.  In similar ways, we can control the decoration to names of counties, surnames and so on.  I am not going to mess with the minutiae yet.  I just want to press to make sure it can do all the hard stuff, or whether there will emerge some hidden obstacle that blocks us from using SMW.  There is a bunch of stuff I find annoying, but all and all, much more usable for normal contributors, and for sophisticated users, I think they will welcome the departure from the frail and impenetrable obscurity of my Info templates code.


 * - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  16:04, 9 May 2009 (UTC)


 * Please don't disparage the info templates code. Great work, which now is well enough documented to get most new users diving in successfully without specific invitation. You will have to persuade them that there's a better way. — Robin Patterson (Talk) 13:48, 10 May 2009 (UTC)


 * If the three wikis you've been trialling or discussing SMW on are not enough, try http://thirdturn.wikia.com/wiki/User_talk:DaNASCAT#blank_pages for a starter into that real-life subject to see if maybe Uberfuzzy can help you get some value from their experience. I listed a couple of other possibles on the Help:SMW page. — Robin Patterson (Talk) 13:48, 10 May 2009 (UTC)


 * Well, as a doting parent of course I am proud of the info pages approach. They are powerful and I am pleased with their success with new users.  SMW encoding may be an unwelcome transition for some and of course it is up to the community whether this radical departure from info pages merits attention.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   17:33, 10 May 2009 (UTC)

More on George Geer
After just one save, I saw much of what I had just done on the display. — Robin Patterson (Talk) 10:02, 12 May 2009 (UTC)

Autocompletion

 * Note that Image and "had people" fields autocomplete with existing files from those namespaces. Similarly, if you click on the picture of Geer, then edit with form for that image, you will see the county is autocompleted.


 * It's nice, but the way we have been using categories is not perfect for getting these lists of correct values. Anything that is a subcategory of a state or county is added, so what we need is to have a category whose only members are only county articles, only locality articles, and so on.  It doesn't matter how deeply they are nested- so we could have


 * County articles
 * County articles- United States
 * County articles- Texas
 * Borough articles- Louisiana
 * County articles- United Kingdom
 * County articles- Scotland
 * These would be hidden categories added by bot. Hidden because we don't particularly need folks adding crud into them.
 * - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  23:56, 12 May 2009 (UTC)

Definition of "locality" etc
A locality is a general term that refers to a settlement, village, town, city, municipality or other local community or governmental unit. (Initial definition on page.)


 * That's artificially narrowing the meaning of "locality". The Nullarbor Plain and the North Magnetic Pole are localities in normal English. Births can occur there. One easy way to restore a fair semblance of correspondence to plain English would be to replace "refers to" by "includes". A better long-term way would be to cut it right down to a simple dictionary definition such as "the position or site of something" (Concise Oxford Dictionary, 10th ed.). — Robin Patterson (Talk) 07:02, 11 May 2009 (UTC)


 * Why "locality" (four syllables) instead of "place" (one)? — Robin Patterson (Talk) 07:02, 11 May 2009 (UTC)
 * A contributor could construe a county, province or landmark like a building as a place. The location hierarchy is
 * street address(multiple, including building names); (I really don't like this name- I want something that covers buildings and landmarks like public squares, but since building names are allowed in these lines by internet convention, I went along with it)
 * locality; (this is what microformats community uses- and they picked it up from other internet standards to mean anything from a community like Riccarton in Christchurch that is not even a city or distinct settlement, to Shanghai/Moscow/New York are megalopolis's that have consumed neighboring cities. Maybe multiples should be allowed?  eg: Brooklyn, New York City
 * county;
 * "state"(province/oblast/department- "sub government unit"); (Isn't canterbury on the south island one of these?  It functions like a county, but there is nothing like a province level between it and the national government)
 * country
 * Syllables may not matter any more. Nobody will be typing "locality" because of the forms interface.  "Place" is ambiguous, and suggests it would be ok to put "Texas" in there.  I don't really care what we call "locality"- it just needs to be broad enough to cover something really small to something enormous, but still less than a county or province.  If you want to call it something different, play around with the names and suggest something else.  It isn't easy, and though locality is clumsy, it is better than the alternatives- eg town.  Shanghai is not a town, and neither is Riccarton. - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   17:39, 11 May 2009 (UTC)

OK, if nobody types it much if at all, syllables don't matter. "Locality" has more feeling of smallness than "place". But the idea that it must be inhabited won't do, for reasons I stated. Landmark Mount St Helen's can have births on it now that the ash has cooled. So we need more terms or maybe more variety of terms in our definition. I guess we also want the user to enter something unique. Riccarton (which was a borough until 1989) and Canterbury (which has been a local govt region since then and is a recognized genealogy region) and Brooklyn (which could be in Wellington although the prominent one is a whole county of New York) are not..... — Robin Patterson (Talk) 15:35, 12 May 2009 (UTC)


 * Ok. We can add something for uninhabited places at any time.  Let me know the field name and I'll stick it in.  Locations has been thought of a lot in the community of folks to concern themselves with these various metadata schemes.  Of course we could introduce something like "landmark" that could handle buildings, mountains, squares and what not.  For precise locations, we can use the coordinates, and those are very nice because we will be able to view stuff like gravesites and former residences from 300 feet up using google earth and and the other virtual earth programs.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   16:56, 12 May 2009 (UTC)

Other Wikia sites; forms extension
Which Wikia sites have progressed further than you have? — Robin Patterson (Talk) 15:35, 12 May 2009 (UTC)
 * I am not aware of what the other wikia are doing. My initial survey was that hardly any that had asked for the extension were using it much and had even neglected to ask for the forms extension.  The forms are not normal pages and their interaction with templates is unusual compared to normal wikimedia functionality.


 * I need to tidy up the forms a bit- these are getting closer to the stage where some of the more technical dudes can figure out how to tweak them. It's just table fiddling- not too bad really.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   16:56, 12 May 2009 (UTC)

Querying
I have not done anything with querying yet. This is where the SMW stuff is really going to bring home the bacon for our stalwart genealogist and micro history contributors. Regarding promotion, I don't see any harm to it..... - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  16:56, 12 May 2009 (UTC)

That shouldn't have happened
Can you give me an example of the article where this occurred? - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  08:04, 15 May 2009 (UTC)
 * Nevermind. I got it.  I will have it turned off shortly. Sorry.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   08:04, 15 May 2009 (UTC)
 * Should be back to normal now. Forms will be optional- clicked from the banner text above the edit window.  If there is any other oddities you notice, let me know as I am about to knock off for the night. - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   08:30, 15 May 2009 (UTC)

Ontology, i.e. categories etc
So SMW doesn't replace categories soon if ever but works with them?

It may reduce the value of producing lots of high-level, complex cats, such as "Migrants from Europe to Canada in the 1820s"?

— Robin Patterson (Talk) 12:06, 16 May 2009 (UTC)


 * Actually, it looks like they are categories with a bunch of extra features.
 * You can query on categories and properties, except you can add numeric or date range checks if they are properties
 * You assign the item to a property nearly like you do a category, except you use two colons instead of one. birth county::Gloucestershire
 * When you create a property page, the items marked with that property appear as a list.
 * except: sub properties are not listed (fixed in SMW 1.2.1. release- this limitation will go away as soon)
 * except: lists are formatted differently - less compressed and with special features for examining the values.
 * You can declare a default form for a category, just as you can for a property
 * if you search for categories, it picks up articles from all subcats. The same is true of properties.  For instance, search on property state picks up death state, birth state, wedding1 state and so on.
 * creating sub properties is alike, but syntax is slightly different. With cat's you simply mention the supercat eg . If counties of colorado were a property, the corresponding syntax would be subproperty of::Property:Counties of Colorado.
 * Categories and properties can be used interchangably in queries. You can create a query eg one that combines Migrants from Europe with Migrants to Canada with Migration date before 1830 and after 1819.  Migrants from Europe and Migrants to Canada could be either categories or properties.  Migration date could not be a property, due to the range comparison.


 * So it is not so very odd after all. By the way, the combined query above can be saved as a "Concept".  There is a bit of hyperbole in calling it a concept, but the idea is that it is a saved query so instead of typing in all that query again, you can simply type Concept:Migrants from Europe to Canada in the 1820s .  You can use these in other queries, or on pages just as if it were a property.  Concept:1750s births implements the category:1750s births.  There may be performance reasons for using Categories rather than Concepts, but I have not yet determined this yet.  They may cache the results on the server for all I know, but I notice a lag when I use concepts versus categories.


 * Hope this helps. Maybe we should collect these remarks somewhere useful. - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   18:15, 16 May 2009 (UTC)


 * Thank you. Yes, we should collect them somewhere else. Not all of our users recognize that "Ontology" can be interesting. I'll give some thought to listing useful talk pages somewhere accessible and recognizable. and here it is! One idea already formed is that we can make subpages of the SMW-site-derived Help page so that individual components can be expounded, locally illustrated, and discussed more easily. — Robin Patterson (Talk) 06:55, 17 May 2009 (UTC)

Info pages in relation to SMW
..... Am I right in thinking info pages are a key feature of our use of SMW? — Robin Patterson (Talk) 04:51, 17 May 2009 (UTC)
 * The SMW stuff is temporarily dependent on info pages for their data. I put in a patch so that we have a mirror of many (but not all) of the fields in the info pages.  This sort of jury rig was necessary so that I could quickly do some volume tests.  The SMW approach is to have what we think of as info data on the same page as the article, as is the case on the George Spencer Geer (1836) prototype page.  The Geer article wikitext is what SMW pages will look like in the future.


 * As I was remarking this morning, the Geer article is what a largely tabular article would look like. However, it is possible to set the values inline in a text passage like with the robert hester example I was referring to in the 2007 article.  Some folks may prefer this style, and that is in many respects preferable to the tabular form that tends to perform a life-sucking reduction of people's lives into a table of vital statistics.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   05:55, 17 May 2009 (UTC)


 * Thank you for all of that. I'd better check that I've read all your recent replies and essays before I ask too much more. Seems we have at least three of us working on this, with one doing over 99% but the others making an effort. It must be a good thing. I've mentioned it on two of my Squidoo lens updates now and copied to Twitter and Facebook. — Robin Patterson (Talk) 06:41, 17 May 2009 (UTC)
 * I am scouting out way way ahead of the main party on some of this, so if I lost you, just ignore me. I just want to make sure I don't set it up a certain way that will wind up coming back to bite us.  Speaking of which, did you follow what I was saying about the  migration kmz files?  ....., check out google earth and how it displays the notes.  Each one of the notes can have linkbacks to the genealogy wikia article.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   06:53, 17 May 2009 (UTC)

Nested and recursive queries
Hoping to build a query "What is the relationship between Ms Smith and Mr Jones?". As a first step, I'd need the intersection of the sets of ancestors of both. So, returns my parents, but  returns an error rather than my grandparents. Any hints? rtol 07:55, 17 May 2009 (UTC)


 * Your married cousin test answers my question. rtol 08:10, 17 May 2009 (UTC)

the cousin test
see discussion on the talk page of married cousin test. I am turning in to bed soon so if you have any questions, you better fire them off soon.

I am not sure we would do the test this way, because it would require us to redundantly store children lists in the articles of both parents. My patch to info pages isn't doing that now, so we just have the list in either the father or the mother- that's why the grandparent lists are usually just one rather than four even though all may have info pages. We might do this inelegant redundant storage scheme due to performance reasons. We'd rely on bots to keep the lists consistent, and this will work if they are easy enough to use. The bot option will likely be, with some custom plugin(s) that I build for it. So it will be a lot easier to use than other bot tools (pywikipedia). - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  08:43, 17 May 2009 (UTC)

So I can't exactly put it in his article?
(I was impressed by Henry's lists of children and asked if I could put it in his article.) "Not exactly..." said the guru. So I tested it: http://genealogy.wikia.com/index.php?title=Henry_I,_King_of_England_(1068-1135)&action=history — Robin Patterson (Talk) 08:45, 17 May 2009 (UTC)
 * I don't see what's wrong, should be ok, but I have to turn in. I'll check it tomorrow.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   08:51, 17 May 2009 (UTC)

Test
Template:OrderCMTest9 works fine in general, but not for Emma de Limoges (c960-c991). I guess this is because she's a daughter from her father's second marriage. Thanks! rtol 11:39, 17 May 2009 (UTC)


 * The problem in OrderCMTest9 was Children-S2 v Children-s2. The latter works. rtol 16:51, 17 May 2009 (UTC)


 * Note that I also made children-s2 a subproperty of "family member". This may be wrong. rtol 17:01, 17 May 2009 (UTC)
 * Yes, wrong: should be a subprop of children. Not sure why you are explicitly probing children-s2 anyway.  Children-s2 is a subproperty of Children, so you should be picking up all children-s2, s3 etc.  If this is not the case, point me to a specific example.  Thanks.  Also, by the way, don't get too attached to these field names.  We are at the learning curve stage and so everything should be regarded as makeshift and experimental, that could be radically revamped with new names or different approaches?  OK?  I expect we should have things substantially sorted in the next month though.  For permanent work, no one should be encoding assuming the current SMW fields except for experiments.  If people use Info fields, they are safe.  Ok?  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   17:03, 17 May 2009 (UTC)


 * Have a look at Emma de Limoges (c960-c991). "children-s2" returns her dad. "children" returns empty. I agree that "children" should be a superset of "children-s1", "children-s2" ... (just think of the test Married third cousin) but it is not at the moment. rtol 17:13, 17 May 2009 (UTC)
 * I'll look at that. Hopefully it is not a flaw in SMW, because the superprop stuff does work sometimes for these depth searches.
 * Problem fixed. Thanks! rtol 17:51, 17 May 2009 (UTC)
 * By the way... there is a bottom up search for cousins by following the parentage links rather than the children links. The weakness of this approach is that folks usually add bunches of children but don't bother declaring the child nodes.  Again, we could deal with this if AWB is easy enough to be used by adept folks such as yourself.  Right now, the latest release gets hung on wikia, at login, but I'll deal with that if the awb devs don't get to it soon.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   17:22, 17 May 2009 (UTC)
 * Cousins is easy enough. 13th cousin twice removed is hard. rtol 17:51, 17 May 2009 (UTC)

(undent) Good. Yes. An issue to be sure, and exponential expansion problems are generic to this problem set. There is a caching approach that I am considering that will alleviate some of the server load issues. - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x  17:55, 17 May 2009 (UTC)

Query depth
I created a "Married third cousin test". This apparently fails (or rather, always returns true) because I nest 4 queries. Is that too much? The "married second cousin test" works fine. rtol 20:16, 17 May 2009 (UTC)
 * There is a query maximum nesting depth setting for the extension. You could have run into it, but that doesn't seem like a lot of queries to me.  15Queries to get to each set of gg grandparents, so 30Queries max, right?  You are in new territory here.  Let me know if you come up with anything.  - <font color="#0A9DC2">~  <font color="#0DC4F2">Ph <font color="#3DD0F5">l <font color="#6EDCF7">o <font color="#9EE8FA">x   01:21, 18 May 2009 (UTC)
 * Pls have a look at Isabella of Portugal (1397-1472) (experiments). The first query correctly returns Edward II. The second query returns an alphabetic list of everybody. Should be empty. Thanks! rtol 05:38, 18 May 2009 (UTC)

That's the end of the bulk-copying. Feel free to add queries or comments on specific items above, but most new additions should be below under specific headings even if the subject has been started earlier. A note on the earlier discussion can helpfully lead to the resumed discussion.