Grants and Projects

Chinese discourse structure: Perspectives of topic chain and event knowledge, National Social Science Fund of China, Grant No.(15YY038), PI, finished (2015.7-2020.7);


"Quantitative and cognitive studies on 'run-on' sentences in Chinese", the China Postdoctoral Science Foundation, Outstanding Fund, Grant No.(2018T110581), the project leader, finished (2017.9-2020.9).



"The textual function of topic chains and its application of E-C and C-E translation" , the Social Science Fund of Zhejiang Province, China, Grant No.(13NDYB145), the project leader, finished (2013.7-2015.12).


"A computational approach to commas in Chinese texts", the Fund of Zhejiang Provincial Social Science Association , Grant No.(2012N067), the project leader, finished (2012.7-2014.7).


"The Design and Establishment of ‘One-Step’ English Teaching and Learning via Multi-media Network Based on GOOGLE”, the Education Fund of Zhejiang Province, Grant No.(2014SCG090), the project leader, finished (2012.12-2015.12).

"The complex structure of news texts in Chinese" , the National Social Science Fund of China, Grant No.(18BYY184 ), Co-PI, in progress.


Understanding Discourse Structure: An Integrated Computational Approach to its Linguistic and Cognitive Mechanisms, German Science Foundation(DFG), PI, pending


Discourse or text generally has multiple clauses or sentences. The parts of discourse are interrelated and form a coherent whole that clearly expresses a meaning. Since communication mostly takes place in discourse (text), research in linguistics cannot simply confine itself to words and sentences. Discourse in the wider sense underlies disciplines such as law, religion, politics, science amongst others. Discourse structure, like syntax, concerns the ways in which discourse units are brought together to form a coherent discourse. Not only have many different theories of discourse structure been developed but discourse corpora were established based on these theories. But this creates a problem for researchers: the very fact these corpora have been established on different theoretical bases makes it difficult to explore discourse corpora in a consistent and unified way. The other problem is that the original data structure in discourse corpora has prevented the introduction of more algorithms for making effective quantitative analysis and discourse parsing. This project aims to overcome these difficulties. It does so by integrating a dependency-network approach and information-theoretic metrics into a quantitative (computational) framework for studying discourse. This project has two key goals: (i) it develops a means of extracting data from different discourse corpora in a unified way; and (ii) it examines such data using a quantitative framework that takes into account both communicative efficiency and complexity, thus enabling cross-language comparisons of discourse structure and the investigation of the cognitive mechanisms underlying the comprehension of discourse.


In order to evaluate feasibility of these goals, we will investigate whether discourse dependency representations with a uniform format can be extracted from the available discourse corpora in different languages, and whether the merger of discourse distance, discourse network and information-theoretic metrics can yield a quantitative framework for the study of discourse. The investigations in this project

will elucidate linguistic (e.g., coherence, linguistic complexity) and communicative mechanisms at the textual level. And by examining discourse structure quantitatively across different languages, this project will also deepen our understanding of the underlying patterns of human thought (e.g., patterns of language efficiency) in the organization of discourse and the cognitive mechanism involved in discourse comprehension . The findings of this project will have the strong potential to improve

artificial intelligence systems in natural language processing such as chatbot, discourse parsing, machine translation, sentiment analysis and text summarization etc. More generally, this project will illuminate text-based disciplines by helping to provide new methods for improving the analysis of legal, media, political, religious and scientific discourse .