Zhengqi He is a research scientist at the RIKEN Center for Brain Science, collaborating with Professor Taro Toyoizumi. His research focuses on computational theories for natural language processing in both the brain and AI, higher-order cognitive functions, and mining of biological big data. Previously, he worked as a research associate at the Facility for Rare Isotope Beams, where he led the development of the online beam tuning model, FLAME.
PhD in Engineering Physics, 2014
Tsinghua University
Graduate Program Candidate in Physics, 2013
Michigan State University
B.E. in Engineering Physics, 2010
Tsinghua University
Minor in Applied Computer Science, 2010
Tsinghua University
Activities include:
Activities include:
Understanding how humans process natural language has long been a vital research direction. The field of natural language processing (NLP) has recently experienced a surge in the development of powerful language models. These models have proven to be invaluable tools for studying another complex system known to process human language, the brain. Previous studies have demonstrated that the features of language models can be mapped to fMRI brain activity. This raises the question, is there a commonality between information processing in language models and the human brain? To estimate information flow patterns in a language model, we examined the causal relationships between different layers. Drawing inspiration from the workspace framework for consciousness, we hypothesized that features integrating more information would more accurately predict higher hierarchical brain activity. To validate this hypothesis, we classified language model features into two categories based on causal network measures, “low in-degree” and “high in-degree”. We subsequently compared the brain prediction accuracy maps for these two groups. Our results reveal that the difference in prediction accuracy follows a hierarchical pattern, consistent with the cortical hierarchy map revealed by intrinsic time constants. This finding suggests a parallel between how language models and the human brain process linguistic information.
A deep neural network is a good task solver, but it is difficult to make sense of its operation. People have different ideas about how to form the interpretation about its operation. We look at this problem from a new perspective where the interpretation of task solving is synthesized by quantifying how much and what previously unused information is exploited in addition to the information used to solve previous tasks. First, after learning several tasks, the network acquires several information partitions related to each task. We propose that the network, then, learns the minimal information partition that supplements previously learned information partitions to more accurately represent the input. This extra partition is associated with un-conceptualized information that has not been used in previous tasks. We manage to identify what un-conceptualized information is used and quantify the amount. To interpret how the network solves a new task, we quantify as meta-information how much information from each partition is extracted. We implement this framework with the variational information bottleneck technique. We test the framework with the MNIST and the CLEVR dataset. The framework is shown to be able to compose information partitions and synthesize experience-dependent interpretation in the form of meta-information. This system progressively improves the resolution of interpretation upon new experience by converting a part of the un-conceptualized information partition to a task-related partition. It can also provide a visual interpretation by imaging what is the part of previously un-conceptualized information that is needed to solve a new task.
Recent advances in the development of large language models have led to substantial enhancements in performance across an array of downstream tasks. Remarkably, these models, trained with straightforward end-to-end objectives, have demonstrated an inherent ability to manage language tasks. Not long ago, tackling language tasks heavily depended on our in-depth understanding of language. The convergence of these trends provides an excellent opportunity to delve into their relationship. Specifically, we pose the question, can contemporary deep neural network (DNN) based end-to-end language modeling paradigms provide us with insights into language? In this paper, we focus on a long-standing linguistic debate, can syntax and semantics be separated? We argue that by incorporating an inductive bias for labor division, the separation between syntax and semantics naturally emerges in the English language. To demonstrate this, we employ a two-tower language model setup. Here, two language models with identical configurations are trained collaboratively in parallel. Intriguingly, this configuration results in a spontaneously emerging preference where specific tokens are consistently better predicted by one tower, while others by the second tower. This pattern remains qualitatively consistent across different model structures and reflects separation of syntax and semantics. Our findings show the potential of DNN-based end-to-end trained language models in deepening our comprehension of the properties of natural language.
Lab for Neural Computation and Adaptation, RIKEN Center for Brain Science