Nowadays, various forms of unlocking patterns are constantly being used. In the field of biometrics, the hottest ones are face recognition and fingerprint recognition technology. They are used in various fields, and now in the information age, the security field has become extremely important. In the context of the face recognition wave, voiceprint recognition will usher in a new machine in the security industry.
At the end of the year, all kinds of financing news are numerous, starting from a million and not capping. Among them, a tens of millions of RMB financing news that is not eye-catching attracts attention.
In the past few years, under the multiple influences of the market, technology, and policies, AI, which has been floating and heavy, has been popularized in various industries. From the application point of view, AI startups mostly use face recognition to land; while SpeakIn (potentially wins technology) has taken a different approach to use voiceprint recognition as a blade to overcome the barriers of various industries, empowering traditional industries, and gaining newness in the near future. A round of financing.
Want to use voiceprint recognition to break through the security circleSpeakIn was founded in Silicon Valley in 2015, focusing on the voiceprint biometric ID. The commercialization mainly consists of four major blocks: 1. Security field; 2. People's livelihood; 3. Financial field; 4. Intelligent hardware. For these scenarios, multiple sets of products and solutions have been launched.
SpeakIn COO Yi Pengyu said that among them, the company with the highest strategic priority is in the security field. There are two main ways to land:
Prevent telecom fraud. Today, local public security departments have personal information collection systems, including face, fingerprint, voice and other information are entered. If the case involves voice recognition, the suspect can be easily found through the system.
Help find lost children. If you suspect that someone is being trafficked, enter their voice in the system, as long as they have their video and voice, by comparing the two, you can determine whether it is a child. Yi Pengyu explained that this is a new means for the public security department, but it will be hindered by the age span.
In this regard, in June this year, SpeakIn also cooperated with the public security department to establish a "synthetic laboratory for intelligent voiceprint system", jointly investing in research and development of advanced products and systems in the field of public security and security, and using voiceprint recognition technology to provide social stability and national security. Service and security.
Voiceprint recognition "previous life and life"The reporter learned that the technology was first born in Bell Labs in the 1940s. It achieves the purpose of distinguishing unknown sounds by analyzing the characteristics of one or more speech signals. Simply speaking, it is the technique of discerning whether a certain sentence is a certain individual.
It is often used in criminal investigation, criminal tracking, defense monitoring, personalized application, etc. It mainly extracts the basic audio spectrum and envelope of the speaker's voice, the energy of the pitch frame, the appearance frequency of the pitch formant and its trajectory, and then characterizes it. It is combined with traditional matching methods such as pattern recognition to perform voiceprint recognition.
Yi Pengyu told reporters that in one case, there were few cases with fingerprints, faces and voices. In the Internet era, many criminal acts are through network devices such as WeChat and telephone, and voice has become the most obvious breakthrough.
In this regard, the public security department has long realized. The reporter learned that many years ago, China had established a voiceprint recognition center. Some experienced experts used some very traditional software to identify the sound spectrum map and spend five hours or even ten hours to hear a voice. The dialect accent, light accent, rhythm characteristics, pronunciation habits, swallowing in the stream, the pronunciation characteristics of the rhythm, the fundamental frequency of the sound, etc. are the voiceprint features that the human needs to judge before the voiceprint expert assistant system. The case may not be heard in a week, and the efficiency of handling the case is very low.
Even so, this mode of operation is currently widely used in various public security departments.
In fact, in theory, voiceprints, like fingerprints, are unique biometric features that have been widely used in the United States. It is reported that the US Federal Bureau of Investigation conducted statistics on 2000 cases related to voiceprints, using voiceprints as evidence only with an error rate of 0.31%. Up to now, thousands of cases have been uncovered for the US police, providing effective clues and evidence for the case handlers.
Obviously, compared with the United States, the promotion and use of this technology in China is slightly slow. The reason is mainly because the technical immaturity of the relevant domestic companies has made the voiceprint recognition eat the 'closed door' in the public security. "Sound is one of the most natural ways of human interaction, but compared to face recognition technology, voiceprint technology has not had much breakthrough in the large-scale recognition ability in the past few years." Yi Pengyu said.
In his opinion, it is time to fully apply it.
From a technical point of view, currently looking for a single sound in the 100,000 voiceprint library, SpeakIn can achieve Top10 (similar value), hit rate of 99% points. According to the reporter's understanding, the country's largest soundprint library is about 50,000 to 60,000. From this point of view, it is a very high probability to help the police successfully solve the case.
How does voiceprint recognition cut into the security industry?In many exchanges with the public security department, Yi Pengyu found that the demand of the public security is very clear, mainly around the ability to “break the case†and to solve the case quickly, conveniently and intelligently.
He introduced that after the public security department introduced the voiceprint recognition technology, the voice of the case was entered into the system (the intelligent voiceprint identification expert assistant system), which was compared with the voice in the library, and then the voice was split into multiple phonemes through machine learning. The spectrum map, after five minutes, can determine the sound attribution, and give a good ranking result (TOP value) to improve the efficiency of case handling.
And this is mainly due to the two "contributors":
There are multi-channel microphones for sound collection, in which multi-channel pulses are loaded, and eight microphones can collect audio sources from different channels, including WeChat, telephone, mobile phones and so on. Yi Pengyu revealed that it is necessary to integrate these microphones on one device. The sound from each channel will be slightly different. After this technology, the system can accurately distinguish whether it is moving 3G signals or moving 4G signals.
The other is a software system called an authentication workstation. There are sound collection, input, comparison, identification and input of results, followed by a set of locally deployed private cloud services.
Overall, SpeakIn ultimately provides a complete set of solutions and services for public security clients.
What is the availability of voiceprint recognition?Dr. Chen Xiaoliang, CEO of Shengzhi Technology, said in an interview with reporters that most of the research is now about real-time detection of dynamic voiceprints. The method of dynamic detection naturally uses various principles of static detection, and many other algorithms need to be added. , such as VAD, noise reduction, de-reverberation, etc. The purpose of VAD is to detect whether human voice, noise reduction and de-reverberation are environmental disturbances. This is not only important for voiceprint detection, but also more important for speech recognition.
VAD is commonly used in two methods. Based on energy detection and LTSD (Long-Term Spectral Divergence), most of the current methods are LTSD. In addition, feature extraction requires: dynamic time warping (DTW), vector quantization (VQ), and support vector machine. (SVM), the model aspect requires Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM).
Although the uniqueness of voiceprint recognition is very good, in fact, existing equipment and technology are still difficult to accurately distinguish, especially the human voice is also variability, susceptible to physical conditions, age, mood, etc., mainly include:
1. External noise;
2. Many people speak;
3. Physical condition;
4. Emotional influence.
In this regard, Yi Pengyu also admits that the application of voiceprint recognition technology is indeed harsh on the environment, and noise interference is still a difficult point in the sound field. (Because not only for voiceprints, speech semantic recognition is also faced with noise and other issues)
He also stressed that voiceprint recognition is used as an entertainment function in areas such as smart hardware; in public security, finance, and other fields, authentication methods are not independent or preferred, and they coexist with other biometric methods. Different types of biometrics have their own advantages. In many cases, they are used together, and there is a barrier. It is by no means exclusive.
It is worth mentioning that with the continuous maturity of technology, the Ministry of Public Security has officially promulgated the industry standard for the Technical Requirements for Security Voiceprint Identification Application System for the procurement of these devices. In other words, the sound can also be identified as evidence.
summary“Different from the large-scale database of face recognition, the difficulty of voiceprint recognition is that the scale of the voiceprint library needs to be expanded. In addition, there is a technical difficulty like a cocktail party in the sound field. I want to do this. Things must be patient." Yi Pengyu said.
Indeed, it is a data-driven model for deep learning. Like face recognition, the training of voiceprint recognition requires huge data accumulation and accurate labeling of data.
Compared with face recognition, voiceprint recognition is more difficult.
The establishment of the voiceprint recognition training library must at least ensure that the gender ratio distribution is 50%±5%, including different age groups, different regions, different accents, and different occupations. At the same time, the test sample should cover the main factors affecting the voiceprint recognition performance, such as text content correlation, acquisition equipment, transmission channel, environmental noise, recording playback, sound simulation, time span, sampling duration, health status and emotional factors.
In other words, voiceprint recognition is actually much higher than voice recognition, and this road will be difficult. Fortunately, there are also AI startups like SpeakIn who insist on being pioneers. In Yi Pengyu’s words, “Isn’t this what entrepreneurs need to do?â€
Aquarium Frequency Pumps,Mute Self-Frequency Pump,Energy-Saving Frequency Pumps,Large Flow Pond Pump
Sensen Group Co., Ltd.  , https://www.sunsunglobal.com