Uniquely Determining Identity Using Computer-based Analysis Of Human Speech by Alan C.

Summary

Given the voice samples of a group of individuals, how can we program a computer using machine learning to identify each individual solely based on their voice? In this research question, we are exploring the area of voice recognition, the identification of a person from the characteristics of his or her voice. Voice recognition in this research project is distinguished from speech recognition in that it is the identification of who is speaking rather than what is speaking. This topic of voice recognition/speaker recognition has been studied for about four decades and has employed the acoustic features of speech that are found to be different between individuals. Speaker recognition consists of two parts. The first part of this process is “enrolling” the individual, which is typically done by obtaining samples of the individual’s speech and converting those samples into features that can be used to identify the speaker. The second part is the “verification” process, which compares a second sample of speech with the model produced in enrollment. By incorporating these two concepts in our project, we will attempt to create a speaker recognition system that reliably identifies speakers from their voice, regardless of the way they speak...In our research project, we will be approaching this problem of identification with the goal that we can find a uniquely determining factor in human voice that can reliably distinguish between individuals. As such, our speaker recognition model will be designed with emphasis on reliability and accuracy in identifying the speaker given voice samples that accurately depict the speaker’s voice...There are numerous resources that provide literature on this topic. Because of our use of machine learning to produce our system, we will be using material such as Machine Learning for Audio, Image and Video Analysis by Francesco Camastra and Alessandro Vinciarelli and the resources on Google’s open source software library, Tensor Flow, to research this subject...The research we are conducting will use voice samples gathered from a selected group of individuals who will record themselves reading large blocks of text. Although we could use online resources to acquire our voice samples, we chose to gather these voice samples ourselves so as to mitigate the errors involved in this step of our project by controlling variables such as background noise, relative volume of the voice samples, and speech being delivered by the group. This places a larger burden on ourselves that could otherwise be avoided by using voice samples from other sources. However, we feel that the benefits inherent in this course of action outweigh the drawback of the extra work it requires...After producing our voice samples, we will be converting the sounds from an audio sample into wave form, which will allow for easier analysis. After doing so, we will program machine learning into the computer to learn how to recognize the various acoustic features found in a human voice. During the first month of our project, we will be primarily focusing on computer recognition of these features, rather than the analysis thereof...In the next step of our project, we will shift our focus towards developing computer programs that will learn to both recognize the patterns of human voice and discern the distinguishing characteristics of each voice. We will identify the characteristics we want the computer to analyze and choose a machine learning algorithm to apply to our data set. We will spend time implementing this algorithm and modifying it to fit our project needs. By making sure that we have a wide variety of samples for each voice by providing the individuals with large blocks of text, we will attempt to ensure accuracy of the programs in identifying each of the individual voices by implementing this algorithm that we have selected. This step of the process will likely take the majority of our remaining time for this project...