MIT Technology Review: A new immersive classroom uses AI and VR to teach Mandarin Chinese

Posted July 16, 2019
Students get to haggle with street vendors or order food, and the environment is equipped with different AI capabilities to respond to them in real time.
Students will learn the language by ordering food or haggling with street vendors on a virtual Beijing street.

Often the best way to learn a language is to immerse yourself in an environment where people speak it. The constant exposure, along with the pressure to communicate, helps you swiftly pick up and practice new vocabulary. But not everyone gets the opportunity to live or study abroad.

In a new collaboration with IBM Research, Rensselaer Polytechnic Institute (RPI), a university based in Troy, New York, now offers its students studying Chinese another option: a 360-degree virtual environment that teleports them to the busy streets of Beijing or a crowded Chinese restaurant. Students get to haggle with street vendors or order food, and the environment is equipped with different AI capabilities to respond to them in real time. While the classroom is largely experimental, it is being used for the first time in a six-week, for-credit course at the university this summer.

The project was inspired by two RPI faculty members who often used role-playing games to help their students learn Chinese. In parallel, over the last few years, several studies have found that interactive learning environments can increase language understanding and retention. One study published in 2018 also found that learning Japanese in a 3D virtual environment made students likelier to pick up vocabulary that they encountered incidentally through the simulation. On the basis of these ideas, the professors struck a collaboration with IBM Research to explore whether they could replicate such benefits for their own students.

In addition to surrounding the students with digital projections of a scene, the environment uses several types of sensors to dynamically adapt to the students’ words and actions. Microphones, worn by the participants, feed their audio directly into speech-recognition algorithms. Cameras track their movements and gestures to register when they point to various objects or walk up to different virtual agents. If a student points to a food dish in the restaurant scene and asks what it is, for example, a virtual agent can respond with the name and description. Narrative-generation technology also allows each agent to construct more sophisticated answers to off-the-cuff questions (“What’s the dish’s history?”) using knowledge from Wikipedia. (The conversation topics are still somewhat constrained, however, to whatever task the student is trying to complete.)

View full article in MIT Technology Review.