Skip to main content

The Ruby study pioneers the use of online US voter rolls

This post highlights how Peter Bradish developed tools based on US voter lists to provide new datasets for American Rubys.  For those unfamiliar with US voter lists ( like me), what follows at the end of the post is an excellent FAQ to enlighten you, including technical explanation.   Thanks to Peter for sharing his innovative approach and to Paul Howes for his input.  

A year ago I responded to the plea for Ruby Project volunteers and took on the nine Rubys
listed in the 1900 US Census in Florida. Since then I’ve learned and discovered a great deal. After
taking the Pharos “Introduction to One-Name Studies” course last winter I understood the great potential of bulk data gathering for an ONS newbie such as myself. 
After the first application of the process to my own ONS (eg. using FreeBMD) I realized its
potential for the Ruby Project when it was suggested that I gather the Ruby information from
the Florida Registered Voters online database. With nearly 400 Rubys, the cutting and pasting
or re-keying of names, gender, date of birth and addresses looked like a month’s worth of
boredom and stress. So I applied what I’d learned to easily gather and record data from
FreeBMD, the database for births, marriages and deaths in England and Wales.
Basically this meant downloading the Florida voter database (67 county files totaling 2GB),
writing and using a simple program to select only the Ruby records (60KB), employing a
spreadsheet to select, sort and clean relevant data, and writing a program to convert the data exported
from the spreadsheet to a GEDCOM file for sending to the project team leader to merge into the 
Ruby Project master file.
It took a long time to accomplish the Florida Ruby voter data extraction and conversion. However, having learned how to do it for one state, applying that learning to the other states with online voter rolls meant they went a lot quicker.  Now it only takes a couple of hours to do a single state plus the time to determine how the data has been recorded and writing a source citation.
Florida Population Map - 2010

Ruby, John James was born 17 September 1974, is male, registered as No Party Affiliation, residing at 700 Villa San Marco Dr, #444, St Augustine, Florida 32086-4142. Florida voter ID number 123456789. The voter lists a mailing address and probably prefers you use it: 11111 S 11Th East Ave Bixby OK 74008-8227. This is the most recent information, from the Florida voter list as of 30 November 2018.
Previous information:
31 May 2013 voter list: John James Ruby, 700 Villa San Marco DR, #444, St Augustine, FL 32086 No Party Affiliation.

There are nine states with easily accessible voter databases: Arkansas, Colorado, Connecticut,
Delaware, Florida, Michigan, Ohio, Oklahoma and Rhode Island. Each has an online capability
allowing the user to drill down by surname and given name to view individual records.  A total of  2015 records have been extracted from those nine states for the Ruby Project.  Given that the nine states make up about 20% of the US population, we estimate that there may be as many as 10,000 Ruby people in the US!
Just extracting the data is not the end of the process, though.  To make the data meaningful, one has to group families living at the same address, which is a process that cannot be mechanized.  For two states (Ohio and Oklahoma), one has to guess the sex of the individual too, which is quite time-consuming and not as simple as it seems given modern naming patterns in America!
The resulting data has lead to some insights.  First, we have a ready supply of modern information to help with family reconstruction and to add to future obituaries as we spot them.  Second, there is a very small proportion of people (well below 1%) who are registered to vote in more than one state!  Some of that is no doubt due to people moving around and the timing differences in online databases.  Florida’s data is practically up to date, whereas Delaware’s dates from 2015.
Lastly, it’s worth noting that the Florida population has grown from just over half a million people in 1900 to over 21 million today, a roughly 40-fold increase. Meanwhile, the population of Rubys has gone from 9 to over 400, but that’s not even including people below the voting age!  Something about Florida is particularly attractive to people named Ruby!

The FAQs
Are all the individuals in the voter lists living? Thus, how does one connect them to a line of Rubys going backward?
All the people are "supposed" to be living but some don't get removed from their state's voter list when they pass away or move out of state. We spotted several duplicate people who have moved between states and one or two who have died since the lists were prepared. The state has to be notified of the person's death or removal. We suspect some states have a period of inactivity  (i.e., the person not voting) which triggers a requirement to re-register but we haven't checked each state's requirements. That people are on the list is sufficient evidence of their existence, however. We can only connect them going backward if we have the records to do so, such as documented birth date and/or marriage and/or name(s) which match. We were able to spot about 100 people who matched with records we already had and merged them in as we added all the data to our master file. 

Do the households identify the relationships between individuals living in the same household? 
No. We have made our best guess about relationships based on the ages and sexes of people living at the same address. On the other hand, since everyone is alive, we are following good genealogical practice and are not publishing the data ourselves, So, it's not crucial whether we have got the relationships correct.  We have plenty of time to sort out any errors. 

What is the span of years of a voter list, for example, the frequency of updating for people deceased, or no longer in the state?
The situation varies by state.  Some are up to date daily (working days) and some are providing new lists annually or less frequently.  All the voter registration records are public information and not subject to privacy laws you might expect. 

How did you move these lists from data to genealogical software or directly to a website database? 
I wrote a program to extract specific surname records (e.g., Ruby) from the massive complete state databases which converts it to a CSV (comma or tab separated values) file which can be imported into a spreadsheet. Once imported, I cleaned the non-genealogical data out (e.g., their past voting history which is whether they voted or not, in person or absentee, registration date, etc.). Then ALL CAPs words and names were converted to first capital only. This is followed by concatenation of address fields into one residence place string. Then first and middle names are concatenated into a given name. Then birth dates are all converted from the US date format MM/DD/YYYY into European format, DD/MM/YYYY used by most genealogists. All columns are then arranged into a common order which I use for the program I wrote to convert the data into a GEDCOM file with name, DoB and residence tags and a source citation for the voter registration database. I also typically add a column with a "sortable" DoB which can be used to "properly" sort the data in actual date order as the data is normally only provided in text format which would sort by month (US) or day (European) order.  As you can appreciate, this is the technical part which won't be simple for most genealogists. This is the reason we think the Ruby study has pioneered the use of such information. 

OK, I accept this is new and appreciate that you will not be publishing the data. So why bother with it at all?
Good question! Yes, we will NOT be publishing this information online until we know a particular individual has died. However, we have already found it helpful. Let's say we have a daily feed from companies providing obituaries like or  Many American obituaries contain lots of information on descendants and other relatives who are still alive. If those folks live in one of the nine states, we can immediately connect them up with accurate relationships, birth dates and so on. Same thing with news items: having a large number of living RUBYs allows us to identify many people who appear in news items and make our database a whole lot more interesting! To summarize, look at it this way: with the voter lists in our file we have tomorrow's census releases today!  


Popular posts from this blog


Thank you to Paul, who took on a project that was untried and became a rather large initiative.  His post below is an excellent summary. It is just a fact that without Paul the bumpy start to this concept would never have achieved what it did.  My own contributions never met their unrealistic goals - oh sure I will cover every Ruby in Canada - and due to many shifting priorities, my commitment regretfully decreased as time progressed but Paul persevered and never gave up the goal - Kudos! Peggy Chapman This is the final note from me as project manager for the initial stage of the Ruby One-Name Study, started by the Guild of One-Name Studies as a means of demonstrating what Guild members could do when working together in a tight timetable to celebrate the Guild’s 40 th birthday in September 2019. We started this project early in 2018 when three of us, me in Florida, Peggy in Canada and Karen in Australia had a few video-conference discussions to figure out how best...
Learnings from the Ruby study   #1 – Impact of the new GRO index One of the first things to do when starting out on a new One-Name Study is to construct some core data sets.   Apart from being a requirement set by the Guild, there are several other reasons why it makes sense to do this. 1.      These lists act as helpful checklists as one reconstructs families 2.      They can also be a useful reminder of the scale of the study in different countries and thus possibly aid in decision-making about where to start 3.      As one notes which individuals from each data set have been included the notes can be used as a means of checking progress and ultimately for answering the question, “How will you know you have finished?” The initial Ruby team constructed core data sets for several countries: notably Canada, England and Wales, France, Ireland, New Zealand, Scotland, and the USA.   The original England ...

Outside your geographical comfort zone? Some tips to help in a one-name study.

In my own one-name study, I have been to different countries both virtually and in person to conduct research.   The effort to date has been primarily in the Channel Islands, pre-confederation Newfoundland, Canada, England, a very little bit of France, and some beginning research in the United States. I am fortunate to have a reasonable degree of fluency in French, which I have used quite a bit in old Jersey documents, although the Jersey dialect itself, known as Jèrrais , is definitely beyond me.   Thank goodness a lot of old documents followed the Norman tradition of “standard” French.   The Ruby project presents quite an interesting experience for those who have not strayed far from home in their one-name study research.   It is unlikely that any version of Ruby has its origins in the United Kingdom, despite a longstanding presence in southwestern England and in parts of Ireland.   Preliminary reading suggests that for both these areas, th...