Английская версия рукописи по филогении лезгинской языковой группы:
Alexei Kassian, ‘Towards a formal genealogical classification of the Lezgian languages (North Caucasus)’.
По сравнению с русской версией, текст расширен за счет тестирования алгоритмов на дополнительной матрице, где когнации размечаются автоматически по принципу фонетического сходства (расстояния Левенштейна). Как результат этого дополнительного теста: дистантные алгоритмы оказались надежнее дискретных (sic!).
The lexicostatistical classification of 20 languages and dialects of the Lezgian group of the North Caucasian family is proposed, based on extremely high-quality 110-item wordlists of the Global Lexicostatistical Database project. The main phylogenetic methods - both distance-based and character-based - are sequentially applied to the lexical data: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). For the etymology-based input matrix, all the phylogenetic methods, except for UMP, yielded the trees sufficiently compatible to each other to compile a summary phylogenetic tree of the Lezgian lects. The obtained summary tree corresponds to the traditional classification and some previously proposed formal classifications of that linguistic group. Despite theoretical expectations, the UMP method suggested the less plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) produced the trees that are rather close to the summary etymology-based tree, whereas the character-based methods (MCMC, UMP) yielded less reliable trees.