(Moved here from Genealogy:Collaboration of the month, so it can get meaningful section headings better than on that general page (but the page history is still there). Robin Patterson 12:31, 27 February 2007 (UTC) )
(This is a discussion page until definite ideas or procedures arise.)
Hey Robin, Glad to see the Genealogy Wiki. This is exactly what I have been trying to accomplish for about 10 yrs now.
In my past experience, I tried to have a database of all individuals I was researching. Thru experience I found that each particular Surname needed it's own database which included variations of that Surname's spelling. This worked quite well, but I did have trouble when I included comments on that individual. This made the database quite large and became very very slow while doing a search.
These comments are best made in a wiki application such as this one. Rather than recoding your current wiki application to include a database which this application would be the only one who could use this data, what if you had an external database? This information could then be used by other applications as well, such as charting, timelines, mapping, etc.
I have started playing with this idea recently with the new options available at zoho.com and google spreadsheets. The database can be embeded into your wiki very easily with links back to the wiki as well.
Some examples for thought:
The zoho example can be found at: *http://creator.zoho.com/cobnet/crawford-webbase/ You should notice a link near the top left hand side of the page for "Embed this page into your site".
The Google Spreadsheet examples can be found at: *http://spreadsheets.google.com/pub?key=pAPPDTKnzhFoC_qmk6yovTQ
If you have a chance to look at both examples you may find they could be a solution for your problem. Myself I like the zoho example better simply because it contains links in the page for Add, Edit, Search, etc. The Google example requires more coding and is less user friendly in my mind. Notice the "Resource" tab in the zoho example above. One could very easily add a link to the Wiki page of any particular individual there.
Also note the "Filter" option near the top left of the spreadsheet of the "Resources Tab". One could easily sort the list for any particular individual by INDI_ID, which would result in the link to the WIKI PAGE if one was made. This is not very user friendly, most folks will not the INDI_ID of the person they are searching for. So I made yet another tab titled: "GenCard". Now I can see all information on an individual at once.
There is lots more work to be completed here before this is a completed option. Is there a way to embed any of these spreadsheets into this wiki? If it cannot be embedded there may be other options with the "More Actions" link?
For anyone wishing to test this idea further choose the "Copy This Application" link near the top of the above Crawford example), which can be copied and a new Surname Database can be started. All comments, ideas, suggested greatly appreciated. I have noticed that a visitor wishing to ADD, EDIT, etc may need to be registered at the zoho site?
- Thanks, Mark. I've encouraged a couple of the active members to look at your new work. Bill had a look. He's probably the expert on embedding things. I know the art of embedding and including templates and other things inside templates is well developed on Wikipedia and moving that way in some of the Wikia sites. I expect we can accommodate that sort of thing. Robin Patterson 11:32, 28 January 2007 (UTC)
- HI Mark
- I did look at your material, and probably need to look at it more closely. Looking forward to a potential application of what you've done raises a question....perhaps I don't understand something, and I'd like to hear how you might apply your work on this wiki. But my question---wouldn't this require the end user to create a spread sheet style data base of some sort, or at a minimum require that they insert their information into an existing spreadsheet on the site? Bill 14:17, 13 February 2007 (UTC)
Sorry it took me so long to reply, you are correct, however adding info to the spreadsheet can be done right from the wiki itself. Please be patient with me as I try my best to make a demostration of this. I don't have as much time as most folks to work with this, but I will make progress.
You might look more at the formatting of the data rather than where the data is contained. It is more important to come up with a standard format that can be used by many applications. I am not saying what I have currently is the "standard format", but I am trying my best to get as many folks as I can to input their thoughts on such a standard. The GEDCOM standard is pretty wide spread, however I feel this format is bloated and a database can hold the same information in less space as the GEDCOM uses.
I am glad to see that you and Robin are interested in this area of genealogy research. It is very difficult to find folks who are interested in the programming side of genealogy. Robin has been a great help to me in the past and is very thorough with advice.
I will try to get back with this site when I have more info to share. I don't want to hold anyone up on their work here. As I said previously, I would be more concerned with the database formatting of the individual data than where that data was contained. More specifically, I am hoping for access to this data from many applications rather than an Individual taking a whole lot of time to enter that data, then when a new application comes around, such as mapping, having to enter that same data again for the new application.
Hopefully you can download the data from our previous webBases into your new application without too much work. You can download this data in many different formats, RSS, XLS, etc., simply choose the "More Actions" button from this link: http://creator.zoho.com/cobnet/cochrane-webbase/ to see all the ways to download this info. This should save alot of efforts in moving this data to other applications.
Until next time, good luck.
- Good stuff, Mark. (By the way - please don't start lines with blanks; see http://genealogy.wikia.com/index.php?title=Genealogy:Collaboration_of_the_month&oldid=25112 .)
- Keep it up, programmers! --Robin Patterson 06:10, 24 February 2007 (UTC)
- Thanks Mark, appreciate the feedback. I sometimes think that the potential speed of the internet does us something of a mis-service. I've got almost all of my fathers genealogical correspondance from the 60's until his death in the 90's---thirty years worth of letters---the old fashioned kind filled with "Sorry I didn't get back to you sooner, but we had a wedding to plan, than John was in the hospital for a month...." When I go through the correspondance, and realize how much time that it took to get any replies, I'm amazed at what he was able to accomplish. I've been in correspondance with some of the people he wrote to, and consistently, even though they are now working the internet, its that same delay between message and response---sometimes months go by! But the correspondance does keep flowing, and its sort of fun to see what they now think, these folks who were corresponding with my father 30 years ago.
- But I'm digressing---that was just a way of saying, I'm used to, and have an appreciation for, long gaps in conversations.
- I'll look forward to your further thoughts on this. When I look at the data table you've adopted, I can see, of course, where you are going with this. I can easily see how we could craft a robot to go to another site, and extract the needed information. You're list is reasonably comprehensive, and could be used in this way. The fact that it is in a set format would definitely facilitate the work of the robot. (Read, make its work possible). I think the problem here is that a) it would be outside of the control span for Wikia, and there would be no guarantee that the database would always be available. To meet the needs of this wiki, I think a database like this would pretty much have to be house somewhere within the Wikia community, preferrably within this wiki.
- However, what I'm thinking of is a bit different. Among other things, I'm trying to reduce the number of things a user has to know about. To that end I've experimented a bit with formatted tables for entry of Vita data such as yours. That includes both the Vita Box that appears on many of my pages, plus the child list table. I'm still experiementing with those. Because this requires HTML programming, the use of these tables currently exceeds the capabilities of most genealogists. Not that HTML programming is that hard, its just that its more than most are willing to undertake. So what I'm looking for eventually is to create versions of these tables that include embedded text or input boxes. When someone creates an article the basic framework would appear in the article in the appropriate places. When they edit the article, the input boxes would be shown, and they could input whatever they might want. When the edit pages close, the display page would show the vita box etc, with the newly added information. This process would be repeated everytime the page was edited (or perhaps, everytime the section containing the tables was edited.) This would avoid having to create a separate and independent database either within or outside of the Wiki. The only requirement here, is that a specialized extension of the underlying wikimedia programming would be required to make this work. That and the creation of other extensions that would transfer information from page to page---that's required in order to meet the objective of eliminating double entry of data on spouse, child, and parental pages.
- This is going to take awhile to implement. I've got some work things that I have to clean up prior to retirement, and that's going to sap my time for awhile. But creating something like this is not exactly simple. There's a lot of in's and out's of setting up something like this. So I figure this is at least a 6 month project to get the basics down right. Bill 14:08, 24 February 2007 (UTC)
- I suspect this can be handled better in email, and would go that route, but Mark hasn't registered, and I can't get to his email address that way, so we will have to muddle along with this approach. Mark, you need to register anyway, and leave a time date stamp on your messages. I will intersperse my thoughts on Marks comments. They will be shown in red below Bill 13:33, 27 February 2007 (UTC)
Food for thought on data formatting and tables,
Thanks for the tip, Robin, I wasn't aware I was starting a line that way.
On this topic: "Among other things, I'm trying to reduce the number of things a user has to know about. To that end I've experimented a bit with formatted tables for entry of Vita data such as yours. That includes both the Vita Box that appears on many of my pages, plus the child list table. I'm still experiementing with those."
Do you have a link to this Vita data?
here's an example:
- Here's an example. (The content below is tucked away on a subpage, so all you'll see when you edit this page is the "magic word phrase" that inserts the Vita Box)
|DOB:||February 1811||inferred from DOD and age on gravestone|
|DOD:||January 22 1848||Based on Gravestone transcription; Stone gives age as 36 years 11 months; identifies her as the "wife of Benjamin Givens". Some give the DOD as 27 Jan 1848|
|POD:||Armiesburg, Parke Co In|
|Burial:||Hesler Cemetery, Catholic Cemetery,near Montezuma IN||Nancy was originally buried in the Hesler Cemetery; An early transcription of the Hesler cemetery shows her stone as the only standing stone in the cemetery; others are described as "underground". A later transcription by Bernice A. Reeder notes that the same stone (same name, text, dates) was now in the Armiesburg/Catholic Cemetery, and indicates that it had been moved. See Hesler Farm/Bug Island Cemetery for what appears to be the old transcription|
|Spouse:||Benjamin Givens (1809-1852)|
|Father:||Phillip Wannamaugher (1780-1835)|
|Mother:||Katherine Justice (?-1851)|
- Here's an example of the child list.
Nancy Ann Wannamaugher (1811-1848)/ChildList
The child list can be solved easily, if you notice the Parents_ID field. Any children on these parents can be found by this field within the same table. The Individual is found by the INDI_ID field. I tried many different ways to come up with a value for this field and found that a number is the best way for me. This number can be derived many different ways as long as it is a unique value. The Parents_ID field has 3 parts: Part 1 is this INDI_ID value, Part 2 is simply a M used as a seperator, Part 3 is the Number of the Family this particular individual has children in. Most folks would be a 1, however sometimes there are 2 different families with children so it could progress.
"Among other things, I'm trying to reduce the number of things a user has to know about." Using the above example to look up an individual's name would require only knowing his INDI_ID. Finding his children would use this same number with "M1" added to it to find the first family and adding "M2" to find the second family, etc.
- So in your scheme of things every person entered has their own X-digit ID number. The use of ID numbers has some advantages, and was considered early on in this wiki. However, at some point people seemed to abandon that idea. Looking at the old discussions of this on the watercooler, people just seemed to give up on the concept. There's probably a good reason for that. ID numbers are the kind of thing that programmers love (very logical, a number for everything: just tell me the number and I'll tell you the thing. Unfortunately, most people don't work very well in that kind of system. Among other things, you have to keep track of all of those numbers in order to make use of them. And good ol' number 3425789, just doesn't have th same memnonic value as say "Mary Ann Whatever". People are not going to relate to or remember numbers. If you use number ID's you have to set it up in such away that they are transparent to the end user.
- In reality, we already HAVE a database built into the wiki. We are not likely to create a secondary database to support the wiki---among otherthings, a) the upkeep would eventually kill us, and b) it would end up being housed outside of the wiki---I don't think we're likely to get the support from WIkia for them to do this just to make this one small wiki happy. I'm sure they would support us in some ways, but setting up a separate database? That's not likely to happen. So if this route were the one we go down, we'd have to do it ourselves, and we'd have to do it in another location. No support a'tal from Wikia on something like that. Indeed, I have to wonder if they'd be down-right hostile to the idea. Also, c) Whatever we do needs to be transparent to the end user. We want to simplify things for them, not make things harder. And going to a different location to enter data, and then have to go somewhere else to see it, isn't making things easier or transparent.
Now for the following generations: At first I had my program find an individual, then that individuals Parents_ID, then look for that individual and his Parents_ID and so forth until a complete family tree was found. I quickly found this took the program quite awhile to create a tree using this method. So I created a new field titled: "Tree_ID" which would allow any characters or numbers. Pretty much at the researcher's choice as long as each Tree_ID was unique. Now with one pass the program could pull the entire tree and then loop thru the tree to place individual into their appropriate place. This was a great speed improvement on creating the family trees.
Other Event related data to this particular individual could easily be found using the same INDI_ID plus searching the Event_Type field for whichever data you were looking for such as: Birth, Death, Marriage. You could easily pull all info by just searching the INDI_ID field.
The Parents Table shows only the INDI_ID of Parent 1. This is because whoever Parent 1 is, if it is either the Mother or Father, that individual should be in the current database. This would eliminate having to store Parent 1's name information into 2 different tables and also makes editing that same name information very easy. The INDI_ID, FName, and Surname of Parent 2 is also collected here. The second Parent, Parent 2 would attempt to show the location of that Parent's info in another database, if one existed. This was not always the case, so we attempted to at least collect that Parent's First Name and Surname. Any other info could easily be placed into the Comments table.
Lastly, if you plan on keeping all the information on ALL SURNAMES in one database, good luck. I started out this way and it very quickly got out of hand. One SURNAME alone was over 5 MEGS and a second wasn't too far from that. I am not saying it cannot be done, but it will be a large challenge.
- Yes, it probably would be a challenge going a separate database route. But the amount of space required really isn't a factor here. Its a factor when you are working on relatively small systems. I don't know exactly what Wikia's set up for this is like, but if you are doing something like setting up 2000+ wiki's, each one open-ended, space is probably NOT one of your limiting factors. Yes, its probably expensive. I'm sure Google has this as a problem in spades, but that's what those ad's on the side bars are all about---making space available (and of course, haveing a positive cash flow.)
You can quickly shorten the space needed to store the information if you do not store the SURNAME, image 10,000 individuals all with the Surname of "Crawford". They all have the same name, why store that? You may notice that the current link above for the Cochrane webBase lists a Variants tab. At first I choose to place all variant spellings of a Surname in one database. I am currently re-thinking this one and it probably should not be there. Each database should contain one and only one Surname. If this is to be redone, then the Parents Table above would have to expand Parent 1's field to include both FName and Surname so that parent could be located as Parent 2 is located. Sometimes a Father's Surname is not spelled the same as his children's Surname?
Hopefully this one INDI_ID would allow us to collect all the information we wanted about any individual in the database. Now, it does not necessarily have to be a number, but it does need to be unique for each individual. There are many ways to accomplish this. As I said earlier, this is just food for thought, I totally understand the taking time to implement part. Good Luck on the Retirement Part. More to come later, Mark
PS: I found the Vita Box, GREAT WORK HERE! You may not need me after all, lol. Is this data accessible to outside apps, such mapping? Can one search POB field of a table and have all names, dates, etc returned for say a Country? This would allow a great mapping app you use your data and whenever your data was updated, the map app would be updated as well?
- Thank you, appreciate the compliment. In theory, anything on the wiki should be available---but the trick is, you have to access the existing wiki database. We do that all of the time, of course, but that's because the underlying program is set up to accomplish that. When you click a link and a new page appears (all within the wiki) what's happening is that a signal is being sent to the database, and the appropriate bit of information is retrieved and displayed.
- That part works like a charm. The problem is, its designed to work with whole pages. We want to access bits and pieces of pages, and create newpages including the various bits and pieces. The Wiki programming isn't set up to do that---currently---that's what developing the extensions is about.
- With regard to mapping app's, theoretically, it would not be a problem to create a robot to search the wiki database, zero in on target words like "POB:", etc, and extract them for export to another page on the Wiki.
You could, for example, create a robot to create a list of everyone born in 1808, etc. We do this now with "categories", but the category labels have to be created by hand. You could create a robot that would create them for you. Then create a page that would list things in whatever combinations you wanted---say "Smiths" born in 1808. etc.
- I think there are two main priorities for this
- a) We need something that is as transparent as possible for the end user. We don't want them to learn HTML programming just to use the wiki effectively. (They don't have to now, but they do have to manually enter in the same data wherever its needed. That's what we want to eliminate.
- b) We do not want to reinvent the wheel. We have an existing database that we are currently tapping. That's being maintained with the current upkeep of Wikia. We want to tap into that database, and use it in ways that Wikia has not currently enabled. Bill 14:41, 27 February 2007 (UTC)
(From here on, all is new; insertions may also be made above where appropriate. PLEASE REMEMBER TO SIGN AND DATE ("~~~~") Robin Patterson 12:31, 27 February 2007 (UTC) )