Research Trends

Bayesian theory reflects how humans make use of probabilistic knowledge in order to process language and make inferences so as to comprehend language. After adding the algorithm of the network concerning the topological relationship, I assume that this integrated methodology is very likely to be more powerful with regard to interpreting in language comprehension, production, acquisition and evolution.


In information theory, entropy [H(x)] (Shannon, 1948) can be seen as the average content of information encoding units. We can find out the information encoding potential of different levels of structure within the same language.


Text (discourse) structure is one of my ongoing areas of research and it mainly focuses on the relations of different textual units, most notably on how the concepts and propositions expressed by these units are logically and semantically related. My goal is to develop formal (computational/ mathematical) methods to provide a theoretical framework and to describe discourse structure . Exploring available corpora from a new perspective should be an effective method to probe discourse structure in a quantitative way. Since an RST (rhetorical structure theory) tree can equally be converted into a syntactic dependency tree, the data extracted from RST treebank can be used to calculate the discourse dependency distance. Meanwhile, the data derived from the RST corpus is also easily converted into the data of network. I found that discourse structure in English has its minimum dependency distance and each type of RST relation has its own range of dependency distance. The frequency distribution of discourse distance data basically follows the power law. A network approach reveals that discourse units are arranged in regular patterns. Dependency distance measures how the constituency of RST relations establishes discourse in some patterns, and network concerns the topological relation among discourse units. The two methods are mutually complementary to reveal how people process discourse. As revealed in these findings, this integrated approach is supposed to be an effective computational model for measuring text comprehension and complexity.


With regard to diachronic changes of linguistic phenomena, I mainly focus on academic writings, compounds and punctuation by considering the frequency, information density and semantic changes. Although these studies concern different subjects, they actually use a consistent methodology.

The importance of punctuation in writing systems has been largely ignored despite the fact that punctuation marks are highly significant in the clear expression of thought. I analyzed data on the frequency distribution for English punctuation marks from some large corpora. From both the diachronic and synchronic perspectives, I found that the frequency distribution for English punctuation followed the laws of least effort. The varieties of English were also found to differ with regard to the frequencies of specific punctuation marks. In the last three hundred years, the practices of punctuation marks have become more syntactical rather than rhetorical or prosodic in nature. These developments show that modern stylistic- grammatical punctuation is developing under the influence of modern writing andcommunication technologies. With respect to Chinese, in contrast to European punctuation traditions, I take the contemporary linguistic viewpoints for exploring the nature and formulating principles of punctuation marks in ancient Chinese texts. This reveals that from ancient times the Chinese language has been a lack of binary punctuation system to indicate two syntactic levels: clause and sentence synchronically. The “fuzziness” of “Juzi(sentence)” in the contemporary Chinese grammar system can be clarified quite clearly because the excessive grammatical information was produced after the binary punctuation system adapted from European punctuation was introduced. This study argues that the root cause of punctuation problems in modern Chinese is the mismatch between binary level punctuation and the Chinese language.