This post highlights how Peter Bradish developed tools based on US voter lists to provide new datasets for American Rubys. For those unfamiliar with US voter lists ( like me), what follows at the end of the post is an excellent FAQ to enlighten you, including technical explanation. Thanks to Peter for sharing his innovative approach and to Paul Howes for his input.
A year ago I responded to the
plea for Ruby Project volunteers and took on the nine Rubys
listed in the 1900 US Census in Florida. Since then I’ve
learned and discovered a great deal. After
taking the Pharos “Introduction to One-Name Studies” course
last winter I understood the great potential of bulk data gathering for an ONS
newbie such as myself.
After the first application of
the process to my own ONS (eg. using FreeBMD) I realized its
potential for the Ruby Project when it was suggested that I
gather the Ruby information from
the Florida Registered Voters online database. With nearly
400 Rubys, the cutting and pasting
or re-keying of names, gender, date of birth and addresses
looked like a month’s worth of
boredom and stress. So I applied what I’d learned to easily
gather and record data from
FreeBMD, the database for births, marriages and deaths in
England and Wales.
Basically this meant downloading the Florida voter database
(67 county files totaling 2GB),
writing and using a simple program to select only the Ruby
records (60KB), employing a
spreadsheet to select, sort and clean relevant data, and
writing a program to convert the data exported
from the spreadsheet to a GEDCOM file for sending to
the project team leader to merge into the
Ruby Project master file.
It took a long time to accomplish the Florida Ruby voter
data extraction and conversion. However, having learned how to do it for one
state, applying that learning to the other states with online voter rolls meant
they went a lot quicker. Now it only
takes a couple of hours to do a single state plus the time to determine how the
data has been recorded and writing a source citation.
![]() |
Florida Population Map - 2010 |
EXAMPLE OF VOTER LIST INFORMATION
Ruby,
John James was born 17 September 1974, is male, registered as No Party
Affiliation, residing at 700 Villa San Marco Dr, #444, St Augustine, Florida
32086-4142. Florida voter ID number 123456789. The voter lists a mailing
address and probably prefers you use it: 11111 S 11Th East Ave Bixby OK
74008-8227. This is the most recent information, from the Florida voter list as
of 30 November 2018.
Previous information:
31 May 2013 voter list: John James Ruby, 700 Villa San Marco DR, #444, St Augustine, FL 32086 No Party Affiliation.
Previous information:
31 May 2013 voter list: John James Ruby, 700 Villa San Marco DR, #444, St Augustine, FL 32086 No Party Affiliation.
There are nine states with easily accessible voter databases:
Arkansas, Colorado, Connecticut,
Delaware, Florida,
Michigan, Ohio, Oklahoma and Rhode Island. Each has an online capability
allowing the user to drill down by surname
and given name to view individual records. A total of 2015 records have been extracted from
those nine states for the Ruby Project. Given that the nine states make up about 20% of the US population, we estimate that there
may be as many as 10,000 Ruby people in the US!
Just extracting the data is not the end of the process,
though. To make the data meaningful, one
has to group families living at the same address, which is a process that
cannot be mechanized. For two states
(Ohio and Oklahoma), one has to guess the sex of the individual too, which is
quite time-consuming and not as simple as it seems given modern naming patterns
in America!
The resulting data has lead to some
insights. First, we have a ready supply
of modern information to help with family reconstruction and to add to future
obituaries as we spot them. Second, there
is a very small proportion of people (well below 1%) who are registered to vote
in more than one state! Some of that is
no doubt due to people moving around and the timing differences in online databases. Florida’s data is practically
up to date, whereas Delaware’s dates from 2015.
Lastly, it’s worth noting that the Florida population has
grown from just over half a million people in 1900 to over 21 million today, a
roughly 40-fold increase. Meanwhile, the population of Rubys has gone from 9 to
over 400, but that’s not even including people below the voting age! Something about Florida is particularly
attractive to people named Ruby!
The FAQs
Are all the individuals in the voter lists living? Thus, how does one connect them to a line of Rubys going backward?
All the people are "supposed" to be living but some don't get removed from their state's voter list when they pass away or move out of state. We spotted several duplicate people who have moved between states and one or two who have died since the lists were prepared. The state has to be notified of the person's death or removal. We suspect some states have a period of inactivity (i.e., the person not voting) which triggers a requirement to re-register but we haven't checked each state's requirements. That people are on the list is sufficient evidence of their existence, however. We can only connect them going backward if we have the records to do so, such as documented birth date and/or marriage and/or name(s) which match. We were able to spot about 100 people who matched with records we already had and merged them in as we added all the data to our master file.
Do the households identify the relationships between individuals living in the same household?
No. We have made our best guess about relationships based on the ages and sexes of people living at the same address. On the other hand, since everyone is alive, we are following good genealogical practice and are not publishing the data ourselves, So, it's not crucial whether we have got the relationships correct. We have plenty of time to sort out any errors.
What is the span of years of a voter list, for example, the frequency of updating for people deceased, or no longer in the state?
The situation varies by state. Some are up to date daily (working days) and some are providing new lists annually or less frequently. All the voter registration records are public information and not subject to privacy laws you might expect.
How did you move these lists from data to genealogical software or directly to a website database?
I wrote a program to extract specific surname records (e.g., Ruby) from the massive complete state databases which converts it to a CSV (comma or tab separated values) file which can be imported into a spreadsheet. Once imported, I cleaned the non-genealogical data out (e.g., their past voting history which is whether they voted or not, in person or absentee, registration date, etc.). Then ALL CAPs words and names were converted to first capital only. This is followed by concatenation of address fields into one residence place string. Then first and middle names are concatenated into a given name. Then birth dates are all converted from the US date format MM/DD/YYYY into European format, DD/MM/YYYY used by most genealogists. All columns are then arranged into a common order which I use for the program I wrote to convert the data into a GEDCOM file with name, DoB and residence tags and a source citation for the voter registration database. I also typically add a column with a "sortable" DoB which can be used to "properly" sort the data in actual date order as the data is normally only provided in text format which would sort by month (US) or day (European) order. As you can appreciate, this is the technical part which won't be simple for most genealogists. This is the reason we think the Ruby study has pioneered the use of such information.
OK, I accept this is new and appreciate that you will not be publishing the data. So why bother with it at all?
Good question! Yes, we will NOT be publishing this information online until we know a particular individual has died. However, we have already found it helpful. Let's say we have a daily feed from companies providing obituaries like Tributes.com or Legacy.com. Many American obituaries contain lots of information on descendants and other relatives who are still alive. If those folks live in one of the nine states, we can immediately connect them up with accurate relationships, birth dates and so on. Same thing with news items: having a large number of living RUBYs allows us to identify many people who appear in news items and make our database a whole lot more interesting! To summarize, look at it this way: with the voter lists in our file we have tomorrow's census releases today!
Comments
Post a Comment