Linguistics research: Hosting 75+ hours of audio data on a user friendly, searchable website

The CoNGA research project focuses on bilingual individuals and whether they retain elements of their native language when living in a different country; for example, their grammar and the way they speak.

Throughout this project, researchers have collected 135 recordings and transcripts of people speaking, totalling 75 hours of sounds files.

The data was all collated in a spreadsheet, including links to the mp3 recordings and transcripts, plus information about the speakers (eg: native language).

Until now there hasn’t been much of this type of audio data available, so the research team wanted to make the sound files and transcripts publicly accessible.

The Research Support Team developed a customised solution that pulled data from the spreadsheet and transformed it into a user-friendly webpage. Other researchers can access it easily from anywhere in the world to listen to the recordings and read the transcripts. It’s searchable too allowing users to filter by the bilingual language pair and other variables.

“This has been a great collaborative project with the Research Support Team. We wanted to make our data user friendly and transform how you can access linguistic data in this research field. Our needs have been fully met and we are very happy.”

Laura Dominguez, Head of Department, Languages, Cultures & Linguistics”

A summary of the technology

The problem

Many research projects have reams of data collated in a spreadsheet. That data is really useful within society and to other researchers, but it’s difficult to make it available for them to use. Spreadsheets are difficult to share and are not always user friendly.

The solution

The Research Support Team has developed a customised solution that will pull data from a spreadsheet and transform it into a web page. This is public-facing and user friendly, with a simple search function. It updates automatically when new data is added.

The impact

Valuable research data is made widely available within society, where is can be used by businesses, educators, policy makers, researchers and other interested groups.