AIR 036 | Tsinghua University Sun Fuchun: 8 Years of Visual and Auditory Cognition

On August 13th, at the smart driving session on the second day of the CCF-GAIR Summit, Professor Sun Fuchun from Tsinghua University brought an eight-year plan of the Audiovisual Information Committee, a major research project of the National Natural Science Foundation of China.

Sun Fuchun stated that this project was established in the National Natural Resources Commission after eight years of demonstration until 2008, which is a major project for visual and auditory cognition. A total of 8 years from 2008 to 2017, their purpose is to study the cognitive mechanism of human visual and auditory sense, develop new efficient computational models, and improve computer understanding of images, speech, and textual information related to human visual and auditory perception. Capacity and processing efficiency are the basic scientific issues surrounding the expression and calculation of cognitive processes.

What is the cognitive mechanism of human visual and auditory perception?

The extraction, expression and integration of basic characteristics

Machine Learning and Understanding of Perception Data

Information Coordination and Computation on Cross-modal Learning

Their main task is to study how people perceive audiovisual perceptions. How do these informations are encoded in the brain? How does the brain area cooperate to fuse information? And we turn these into computable models so that we can encode auditory information and visual information, process it through computational models to perceive and understand the environment, and compare this understanding with human understanding. Finally, this technology is used in driverless areas.

In the speech, they demonstrated their important results in audio-visual perception from 2008 onwards. From the beginning of 2009, they have held a total of 7 unmanned vehicle challenges. In the process:

Published more results on cognitive science.

Natural language understanding and brain-computer interface are integrated into the car's platform, and many of the results are still in the lab.

The significant progress made through the unmanned vehicle platform further promotes innovation and leads the development of the unmanned vehicle industry.

Vision of the future

1. Cognitive mechanism research results, how to form a computable model. We have explored many methods and we need to further improve it.

2. How the context-aware topology information is expressed and understood in the cognitive process.

3. Explore emerging multimodal sensors, such as integrating audio and video information.

4. To study the mixed problem of human-machine intelligence. This is also the version 2.0 of artificial intelligence that has just been mentioned by the country. We must study the intelligent system of human-computer hybrid.

It is hoped that more results on cognitive science can be published through the Unmanned Challenge Challenge platform.

The following is a speech record:

Distinguished guests, ladies and gentlemen, good morning everyone! If one day you are sitting in a driving car, seeing no driver in the cab, or if the driver is not sure of the steering wheel, you must not be shocked because we have entered an era of unmanned driving. It may be hard to imagine that from Changsha to Wuhan, there are rainy days and sunny days in more than 2,800 kilometers of road sections. Manual intervention only accounts for 0.75% of the entire road section. There is no human intervention from the Beijing-Tianjin section of 150 kilometers to achieve full autonomy. drive. Perhaps more difficult for you to imagine is that the autonomous vehicle traverses only 11 centimeters of wool on the 2.08-meter roadblock, and the efficiency of autonomous vehicles is five times that of manpower.

We have today brought you a story from the Audiovisual Information Committee of the National Science Foundation Committee's major research project in the past eight years.

The cognition calculation of visual and auditory information I likened it to 8 years of war of resistance against liberation and the war of resistance against US aggression. After eight years of argumentation, this project was established in the National Natural Resources Commission in 2008 and it is a major project for audiovisual cognition. To this day, we would like to thank Academicians Zheng Nanning, Li Deyi, Chen Lin, and Sun Jiaguang.

The audiovisual information is first seen. This picture shows that this picture was mentioned in an article published in 1997. We have seen that God is particularly favored by people. From our eyes to the slightly curved cortex. This link is connected to the perception part and the information processing part. Everyone thinks of such a long perception and processing part. Our sense of touch and other senses and hearings do not have such a long sense of perceiving and dealing with road sections, so the eyes we see are our hearts. Windows. 80% of the information we humans get from the outside world is from the visual, and 60% of our cortex is visually related.

Hearing is also a very important part. After the natural image is coded and coded as a lean function, in recent years we have even discovered the tactile and visual isomorphism. This makes us think that the eyes of the scorpion are particularly good. In the future we can visualize it through the artificial camera. The code became a tactile code, allowing the deaf to feel the outside world. In the past two years, there has been the emergence of artificial retinas, eager to open another channel. In the past two years, we have also discovered that this speech and speech have particularly good desiccation characteristics under sparse coding. Does the voice also have a touch-like underlying structure? This is the issue we want to study. The audiovisual information in this plan refers to the image voice and text related to human visual and audio perception.

We usually have a lot of audio and visual information in our daily lives. We have a variety of means. We call information devices, cell phones, cameras, network cameras, satellite remote sensing, and so on. Everyone should be able to think that our present people are living in a ternary world. Which ternary world is it? It is a network world, a physical world, and a shared knowledge world. Before the network was born, we lived in a binary world. How did people in the three-dimensional world work? One of my students said that you translated the two English sentences to me. I said that you would give me tomorrow. He immediately gave me. He put it in Google and checked it. In the past, it was hard to imagine that robots had global intelligence. Its intelligence was local. Today, in the online world, robots can get global intelligence.

To talk about a simple example, an autonomous vehicle can go to an unfamiliar environment, for example, from the airport to our venue. It simply finds a path on the Internet. It can search the dog map and it can plan the path. The camera can recognize the shape of the camera. Find our venue and enter the middle of the venue. This is what the internet brings us. As you can see, there are a large number of networks on the Internet. We call the massive audio-visual perception data. How we can effectively discover these technologies and turn them into usable knowledge is a very important method for unmanned vehicles.

We have seen that in voice monitoring and auditory monitoring, we first look at video surveillance. For example, in Beijing, there are more than one million cameras. How much information is it? An hour's worth of information is equivalent to the sum of all programs on CCTV. It is difficult to get such a large amount of information in a timely manner. In addition, in terms of voice monitoring, we give a simple figure. For example, the number of overseas Falun Gong incoming mirror phones reaches 500,000 per day, and the daily call volume is 400 million minutes. It is very difficult for us to handle such information effectively.

Let's compare robots and people again. We are all very clear about structured information, such as reports, exam information for student registration, and the like. The capacity of the machine is far greater than that of people, but for unstructured information, such as auditory information, the visual information in the car is much higher than the machine. We can quickly find friends I know in the crowd. Someone driving a car can drive in any very complicated environment. Therefore, although the machine's speed of calculation increases, the computer's cognitive ability is very backward, and its cognitive ability is even less than a three-year-old child.

Let's compare the human machine processing method with parallelism on the human wall and perception. People are particularly good at trans-modal information. Everyone knows that the visual cortex and the auditory cortex of a cat are coincident, but people are separate. Visual touch, how do these cooperate? Why people have such a strong ability, the overall aspects and selectivity are also strong. People can see mothers in the crowd. If the twins are recognized by the mothers and children, the machines are mistaken. And the child knows what he means. It is difficult for the machine to do.

The guiding ideology of our visual and auditory cognition is to study how people perceive audiovisual perceptions. How do these informations are coded in the brain? How does the brain area cooperate to fuse information? And we turn these into computable models so that we can encode auditory information and visual information, process it through computational models to perceive and understand the environment, and compare this understanding with human understanding. Finally, this technology is used in driverless areas.

To introduce our major plan from 2008 to 2017 is a total of 8 years. Our goal is to study the cognitive mechanism of human visual and auditory sense, develop new efficient computational models, and improve computer-related images that are relevant to human visual and auditory perception. Voice and text information comprehension and processing efficiency make important contributions to national security and national economy. So we focus on this demand. Our goal is to focus on the basic scientific issues of the expression and calculation of cognitive processes.

First, the extraction, expression and integration of the basic characteristics of perception, mainly we should explore the mechanism of the extraction, expression and integration of the basic features of human visual and auditory information, and lay the foundation for the establishment of relevant high-efficiency calculation models.

Second, machine learning and understanding of perceptual data mainly revolve around the unstructured and semi-structured features of image, speech, and linguistic data that make it difficult for computers to implement transformation from the data layer to the semantic layer. Establishing a new machine learning method is to achieve this. The effective way of transformation.

Third, on information coordination and computation across modal learning. Visual Information and Auditory Information It is a dynamic sequence. What can it be expressed? The form of the motion manifold, for example, is visual information in the form of streaming, which is auditory information. Video-audio fusion first needs to find the common parts of these two information manifolds, and then call it fusion information. Then we can deal with the fusion information.

Multi-mode fusion Let's take a look, the basis functions of the two sensor information are the same. The function base of image and sound is not the same. The concept of main sparse is drawn here. If the difference between the two functions is relatively small, we can find its common part. This is based on the principle of group sparse coding. Therefore, our expected result is to conduct frontier research on three core scientific issues. We have achieved a lot of progress after eight years of efforts in the basic theory of the visual and auditory information recognition problem. Three key technologies have achieved breakthroughs, such as collaborative computing of visual and auditory information, natural language understanding, and visual and auditory awareness. Related Brain Machine Interfaces. We also created two international competitions. One is the future challenge of unmanned vehicles, and the other is a brain-machine interface game.

Let's take a look at our achievements. We have three National Natural Science Awards.

(PPT)

Driving the brain is an outstanding result of our research over the years. Its main result is to simulate our experience in driving. One is called downline. How do we humans make decisions in such an environment, and we learn people's cognitive ability through our long-term experience. This is when we learn about people's visual and auditory information capabilities, we need to remove some of the emotions people have in the driving process. For example, people may have some emotions in the driving process. These are in our cognitive process. To be removed.

We continue to see that this is the brain of our people. It has long-term memory and personality. The character of a person determines whether he is conservative in driving or whether some people are more assertive. We have a long-term sense, which is the experience and skills that people form during long-term driving. The motivation is to complete the one-time path planning of the travel task from the beginning to the end. Short-term memory: It means the driver's selective attention, focusing only on the past and the current surrounding driving situation. Emotions: Rejected the emotional part of the human mind into the driving brain, will never be distracted by emotions, the robot is always focused. In the past, people have also seen that when a driver walks through a street, he finds a photo of a very expressive girl on a building in the street. The result is a crash. The current robot unmanned vehicle can stop this phenomenon. The second is based on learning and thinking, such as SLAM, through memory matching to complete the second plan to determine the next moment of behavior. This is the concept of driving the brain.

We take this concept from the ascending and descending lines to get the next schema. Our eyes and ears can sense the outside environment. Our people based on perceived environmental information, such as where is there, there are no obstacles and goals next to it, through the long-term memory area to decide how this situation should I drive, this is called action. Then compare the information of actions and perceptions of information. Did I achieve the results of my actions and formed such a closed loop, from dynamic perception to situation analysis, autonomous decision-making to precise control and online actions. In this case, we have created a map of the driving brain. In the front part we call the perception part, called the perception domain. The planning part is called the cognitive domain. This part is called the action domain.

In this case, for example, if we have the first GPS, radar, and optical system on our driving car, we generally do not use GPS in the game, then form long-term and short-term memories, and then fuse these perceptual information. Driving situation map. There is a very important concept in this area is the road rights, which is the space that the car itself occupies in the formation process. On this basis, an autonomous decision is formed. For example, how much speed should change, how much the rotation angle should change, and form a decision-making memory pool. The control module controls the unmanned vehicles. This forms a closed loop from perception to decision-making to control. This is done through Nvidia's DrivePX autopilot system.

(PPT)

This is the final wheeled robot that learns the process of driving to an experienced driver. The left shows the past driving experience. On the other side, it is the driving experience of humans. Everyone sees the rightmost piece is our driverless driver. Cars, through perception, through the formation of the number of postures, realize the perception of driving situation. Then by carefully extracting and then forming a memory. This is the current cognition. The current cognition of people in the driving process, including the driving situation map formed by the mountain road and the visual synthesis, is matched with the experience situation. In this environment, how should I drive such a task? In the experience database, I have done this in the past. In this case, it should be the best way to do this. Find a match. After finding it, use this experience to learn and use it to recognize and manipulate the steering wheel.

This process can be achieved through deep learning. For example, we have done it. We have reasoned through the second model. Under this environment, we have to address these obstacles. How should I do this? How much change in the speed of driving? How much to form a model, this model can also be expressed through deep learning.

Another important task of our expert group is the detection of vehicles. The problem is that in the past, since 1998, there has been no article describing that the detection method of vehicles is completely out of training samples. We have proposed a method here. This method is based on triple inference between two-dimensional and three-dimensional space and then cross-validation. This framework is completely separated from the training sample, and full of Li Yongle's three-dimensional semantic scene and image information.

In the past eight years, we have also done a lot of work in the cognition and mechanism of audiovisual information. For example, Neuron in the neuroscience field and 2012 and IEEE CVPR2010 have a job. This work in the United States CVPR, the efficiency of this method has greatly improved. This work is about multi-tactile vision segmentation, using time-structured noise stimuli to study the brain's time-segmentation process and find two optimized time scales.

Everyone knows that there is an international brain imaging conference in the world. It is very difficult for the general assembly to make a report here. This report was elected by the Academic Committee of the Organizing Committee. Our country has not had one in more than 20 years. At the conference, our academician Chen Lin was the first to make a report on the theme of the conference.

We have a very good job in the multi-channel brain-computer interface. In the two consecutive years, this article was listed as the most cited article in this magazine.

(PPT)

This is an efficient character input for a non-invasive brain-computer interface that doubles the speed of character input. This article was published in the American Journal of Natural Sciences. This is the best work in this field. We also put the brain-computer interface on the unmanned side and use brain control to control the movement of unmanned vehicles. In addition, we implement automatic parking through the brain-computer interface. This is a brain-computer interface game that we have been sticking with since 2008. We are now in a world leader in non-immersed brain-machine interfaces.

Here are some demonstrations that I will make a report with everyone. This figure is from Changsha to Wuhan in July 2011, a total of 286 kilometers, which lasted 3 hours and 22 minutes. There was rain on this road and there was overtaking. The entire manual intervention was only 2140 meters. This work was completed on November 25, 2014, an autonomous driving trial of the Beijing-Tianjin Long Distance Expressway, which lasted 1 hour and 30 minutes.

(video)

In the final two minutes, we introduced the challenge of the unmanned car. From July 2009 to the end of last year, I held a total of 7 matches. The first time was Xi'an’s Chang’e Ecological Zone, which was a 2.6-kilometer section. In 2010, it was also held in Chang'an University in Xi'an, and it was also a 2.6-kilometer section. In Ordos in 2011, it was added to 10 kilometers. Behind me, I have a table showing the basic situation of these competitions. It began in Changshu in 2013 and Changshu in Jiangsu from 2014 to 2015. We have listed seven matches in this table. In the past seven years, the number of participating teams has increased. The largest number is 22 teams. The playing scene is becoming more and more complicated, ranging from 2.6 km to 6.7 km to 10 km to 13.5 km. Judging from the results of the competition, the human intervention basically did not end at all, and the speed is getting faster and faster, including the Changsha-to-Wuhan demonstration that I just demonstrated, and the Beijing-Tianjin Tianjin are all completed without intervention, so our game It is from the confined closed road to the real road environment.

In conclusion, we have scored many achievements in the past eight years. We still have some work that we feel is very important in the future. The first one is the results of cognitive mechanism research, how to form a computable model. We have explored many methods and we need to further improve it. The second is how the context-aware topology information is expressed and understood in the cognitive process. In addition, explore emerging multi-modal sensors. For example, integrate sound and video information into one piece. The other is the mixed problem of human-machine intelligence. This is also the version 2.0 of artificial intelligence that has just been mentioned by the country. We must research the intelligent system of human-computer hybrid.

Finally, our idea is that we need to use this platform to publish more results on cognitive science. The second integrates natural language understanding and brain-computer interface into the car's platform, and many of the results are still in the lab. The third is that our requirement is to make significant progress through the unmanned vehicle platform to further promote innovation and lead the development of the unmanned vehicle industry.

In the end, I used this poem to end my sharing with everyone today: “He Global Global Artificial Intelligence and Robotics Summit”. (PPT)

thank you all!

PVC Insulated Power Cables

Pvc Insulated Power Cables,Pvc Insulated Power,Pvc Xlpe Cable,Insulated Pvc Sheathed Cable

TRANCHART Electrical and Machinery Co.,LTD , https://www.tranchart-electrical.com

Posted on