DS0807 -

The mathematics of segmental phonotactics – MathSegPhon

Submission summary

Languages differ widely for their inventory of phonological sounds. Optimality Theory (OT) models segment inventories through rankings of feature co-occurrence constraints (FCCs), which penalize certain combinations of feature values. This project develops the first comprehensive theory of FCCs through the Tree Hypothesis (TH). It maintains that the FCCs define universal relationships among the features which are representable though a feature interaction graph which crucially has no loops, namely it is a tree. The project explores the TH from both a typological and learnability perspective through four work-packages (WPs).

What are the typological implications of the TH that feature interactions have no loops? what types of segment inventories does this hypothesis exclude? WP1 addresses these questions through a formal analysis of the factorial OT typologies predicted by the TH. This analysis will rely on a toolkit for formal factorial analyses recently developed by A. Prince (Dept of Linguistics, Rutgers University) and N. Merchant (Dept of Mathemtics, Eckerd College), who will be partners on WP1.

An intractability result previously obtained by the applicant will be strengthened to show that the task of learning segmental phonotactics is provably too hard (it admits no general efficient learner) without restrictions on the FCCs. Does the typological structure provided by the TH (as distilled in WP1) suffice to support efficient learning? WP2 will address this question, from the perspective of both batch and error-driven learners. The aim is at analytical guarantees, rather than just simulation results. This part of the project will be carried out mainly by the applicant, in consultation with Prince and Merchant. It builds on a pilot result which has established learnability for the special case where the feature interaction graph is a directed path (the simplest type of tree).

Is the typological structure formally predicted by the TH (WP1) and justified by learnability (WP2) actually attested? To address this question, WP3 will harmonize existing databases of segment inventories (UPSID, PHOIBLE, P-Base, WDP). And it will develop software for automatically extracting inventories of segment types from the resulting large database. WP3 will be carried out by S. El Ayari (SFL), in collaboration with the applicant and a post-doc with substantial expertise in segmental phonology (hired on the ANR grant for 18 months).

WP4 will then develop a system of FCCs which complies with the TH and yields a factorial OT typology of segment inventories with a good match to the typology documented by the database. An additional issue considered is whether this match can be obtained through FCCs which are furthermore phonetically ground-able (in the sense of Hayes and Steriade 1999), thus investigating the connection between formal learnability and phonetic substance. The post-doc will be mainly responsible for WP4, under the supervision of the applicant and in collaboration with F. Torres-Tamarit (SFL) and Prince.

The project provides new impetus to the search for phonological universals, moving away from the concrete universals considered in the literature (such as “every language has a coronal stop”) towards deeper, formal universals such as the TH, with the potential of substantial learnability implications. Furthermore, the project supplements the traditional poverty of the stimulus arguments with hardness of the task arguments, which consist of mathematical proofs that the learning problem is intractable (no matter the richness of the input) without the additional structure provided by formal universals such as the TH. Through a team with a complementary expertise in mathematical learnability and segmental phonology, the project will develop the generative axiom of a connection between learnability and typology to a new level of mathematical sophistication.

Project coordination

Giorgio Magri (Structures Formelles du Langage)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


SFL Structures Formelles du Langage

Help of the ANR 112,626 euros
Beginning and duration of the scientific project: December 2016 - 24 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter