How FamilySearch is using the future to discover the past with AI

FamilySearch has released more than 2.6 billion historical resources to the public, and according to John Alexander, a senior product manager there, many more are in the pipeline. It’s just a matter of having the documents transcribed.

More than 5 billion other documents—collected and converted into digital images—must be transcribed before they can be searched and used in the FamilySearch database.

And every day 1 to 2 million more are added.

With the development of new artificial intelligence technologies, there is more hope of providing billions of records to families looking for information about their relatives in just five years. And it is already being tested and deployed.

Jane Pierce reviews her family history on Thursday, March 16, 2023 at FamilySearch headquarters in Lehi.

Jeffrey D. Allred, Deseret News

“In just a few hours, the computer can index more than you or I could do in a lifetime if we did nothing but index for the rest of our lives,” Alexander said. “So in terms of efficiency, it’s very fast.

“In just a few hours, the computer can index more than you or I could do in a lifetime if we did nothing but index for the rest of our lives,” Alexander said.

Currently, English, Spanish and Portuguese are taught – yes, “taught” like a child – with plans for Italian in 2023.

Teach AI to read handwriting

It takes time to get a sophisticated system like AI transcription working because it needs to be taught and trained, Alexander said.

“When we show the image to a computer, it just sees ones and zeros – pixels,” he explained. “None of this means anything to the computer.”

“And so we have to teach him similarly to how we teach or teach a child to read, we have to teach him every single letter and every single character, we have to teach him the way pages are laid out, how every line is different the other,” he continued. “All of this takes time and training.”

READ :  Artificial intelligence ETFs or artificial intelligence ETFs?

John Alexander, a senior product manager, digitizes historical property records at FamilySearch headquarters in Lehi on Thursday, March 16, 2023.

Jeffrey D. Allred, Deseret News

Not to mention it’s starting to read manuscripts from the 14th century, which can be difficult for human volunteers.

Once he’s taught one language, the other languages ​​in the family become much easier to teach – it’s kind of an exponential process.

And faster than they could imagine, the AI ​​was able to read and index documents in English, Spanish, and Portuguese that would take human indexers half a century to do in just a few months.

drowning in documents

Unless indexing of these documents is speeded up, these records will never be available to the public, which is the goal of FamilySearch.

“A large part of what family tracing does is collect and store historical records of the world,” Alexander continued, “particularly those records that tell about people and their relationships to one another.”

Since the 1940s, FamilySearch has sent cameras to different parts of the world to capture and preserve historical documents, such as old parish registers that record births, baptisms, marriages, and burials, or censuses that tell all about the people they are with seniors and parents live in the household.

Last year they completed the digitization of over 60 years of microfilm documents.

“Here’s the problem we had. In order for these images to be useful and accessible for people to find their families on, they need to be indexed,” he said.

Alexander said that only about 20% of the documents that FamilySearch has been able to collect and copy information from are readily available on the site — and they’re rapidly losing ground as documents are being copied and stored faster than the information can fly off the page .

READ :  Why cloud calling needs a touch of AI revenue intelligence platforms elevate sales teams’ success | VanillaPlus

Indexing, as Alexander mentioned, is a human-powered process performed by volunteers who can browse and view the image and fill out a digital form with the information from the historical document. This allows the information to be placed in the online database and searched by any user anywhere in the world.

But it’s limited because people can only work so fast.

“Although we’ve tried to expand our indexing of family tracing and include more people, digitization has accelerated so much that we can’t keep up,” Alexander said. “We can’t index all the images that come through the door.”

Patrons work on family history on Thursday, March 16, 2023 at FamilySearch headquarters in Lehi.

Jeffrey D. Allred, Deseret News

quality control

Whenever AI is involved, some skepticism enters the equation. A frequently asked question about the whole AI transcription process is “how accurate is it?”

“The computer makes mistakes,” said Alexander. “It might read the name of a street and think that’s the baby’s name – stuff like that.”

But FamilySearch, he added, is very committed to high-quality records and plans to maintain that high quality for the site’s users. Recordings that do not meet a certain quality threshold will not be published on the website.

“If we didn’t worry about quality, we could work much, much faster with the computer,” Alexander said. “But we pay a lot of attention to quality.”

FamilySearch processed far more documents than it made available to the public because there are quality thresholds that give the document a grade based on its accuracy. And all thanks to quality control volunteers.

READ :  5 things we learned about Mars in 2022

Human indexers are not out of work

This is done through a volunteer program on the Get Involved page on the site launched at RootsTech 2022, which allows volunteers to sift through the information indexed to verify its accuracy.

Based on what the AI ​​has read, suggestions are made to the volunteer to accept, decline or skip the information as shown in the image below.

This is a screenshot of the Get Involved work that FamilySearch volunteers can work on. The AI ​​will read the document and make suggestions to the volunteers as part of the quality control process.

But until the AI ​​process is perfected, volunteers are still needed to “make sure the computer is indexing correctly.”

“Computer automation will not replace our volunteer indexers,” Alexander said. “If anything, we need more of them.”

“Computer automation will not replace our volunteer indexers,” Alexander said. “If anything, we need more of them.”

Although transcribing AI is still in its infancy, the FamilySearch team is very optimistic about what this means for genealogical work.

“The amount of information available to genealogists, researchers, people who want to discover and find their families, and the search systems behind them is going to be enormous in the future,” Alexander said. “And we’re going to have really wonderful experiences with this technology, especially in the area of ​​family history.”

John Alexander, a senior product manager, holds an old microfilm that was digitized at FamilySearch headquarters in Lehi on Thursday, March 16, 2023.

Jeffrey D. Allred, Deseret News