Object: Based on the genetic and environmental factors of developmental dyslexia, a risk prediction model for children with developmental dyslexia was constructed and its prediction effect was verified, so as to find out the optimal risk prediction model and provide reference for early identification of high-risk children with developmental dyslexia.
Methods: (1) Extract the influencing factors of developmental dyslexia through meta-analysis, and make the Chinese Reading Ability Infuencing Factors Questionnaire. (2)There were5554 children from grades 3 to 5, were randomly selected from 7 primary schools in Xinjiang Province by random cluster sampling method. According to the group matching principle,284 children with developmental dyslexia were screened out, and 284 normal control children were matched. (3) Questionnaire survey was adopted to investigate the influencing factors of developmental dyslexia children by using the Questionnaire on Influencing Factors of Chinese Reading Ability. (4) The data of influencing factors of dyslexia and the susceptible gene loci of dyslexia discovered by the research group were included. Univariate analysis and binary Logistic regression were used to screen model variables, and the variables with differences were selected to construct Logistic regression model and support vector machine model, and the internal verification of the model was carried out by the 5 fold cross validation method. The predictive performance of the model was evaluated by comparing the area, accuracy, specificity, sensitivity and other indicators under ROC curve.
Results: (1) Meta-analysis showed that the prevalence of developmental dyslexia was 4.40% (95%CI: 3.80%~5.10%), Gender, father's occupation, mother's occupation, father's degree of education, mother's degree of education, family per capita monthly income, whether there is difficulty in completing homework, storytelling, encourage children to read extra-curricular books, watch TV time, buy children's favorite books, buy children's books frequency, whether children have fixed reading time, parents reading frequency, annual spending to buy extra-curricular books, active learning habits, father Consistency of maternal educational attitude is the influencing factor of developmental dyslexia. (2) Univariate analysis results showed that there were statistical differences between the two groups of children in terms of parental education level, whether children had fixed reading time, watching TV time, difficulty in completing homework, story telling, frequency of parents reading books, frequency of encouraging children to read extra-curricular books, frequency of buying children's books, buying children's favorite books, and annual spending on extra-curricular books (P<0.05). (3) Binary Logistic analysis showed that the educational level of the father (OR= 1.37), the educational level of the mother (OR=1.49), the time spent watching TV (OR=1.62), the habit of active learning (OR=3.03), the difficulty in completing homework (OR=6.31), the frequency of buying children's books (OR=1.30), and the children Whether there was a regular reading time (OR=2.23) and how often parents read books (OR=1.54) were independent factors for children with developmental dyslexia. (4) The model was built by incorporating influencing factors and data of polymorphic loci (rs2652511, rs2975226, rs2710102, rs3779031, rs3756821) with significant differences collected by previous research groups. The Logistic regression model equation is Logit(P)=0.406× father's education level +0.434× mother's education level +0.433× TV watching time +1.109× active learning habits +1.77× difficulty in completing homework +0.353× frequency of buying books for children +1.069× child's whether With fixed reading time +0.527× frequency of parents reading books +0.425×rs3779031, the area under ROC curve was 0.877, accuracy was 79%, sensitivity was 78.71%, specificity was 79.32%, F1 was 79.35%. (5) The risk prediction model for children with developmental dyslexia was established based on the theory of support vector machine. According to the importance analysis of support vector machine, the top three risk factors were difficulty in completing homework, no active learning habit and long time watching TV. The area under ROC curve was 0.88, the accuracy was 77.46%, and the sensitivity was 83.10%. Specificity 71.83%, F1 78.67%.
Conclusion: (1)The risk factors of dyslexia are low education level of parents, long time watching TV, no active learning habit, difficulty in completing homework, irregularly buying children's books, no fixed reading time, low frequency of parents reading books, rs3779031G allele. (2)Focus on children who have difficulty in completing homework, do not have the habit of active learning, and watch TV for a long time. These children are likely to be at high risk of dyslexia. (3)Both support vector machine and Logistic regression model have good prediction ability, but the comprehensive prediction value of support vector machine model is higher.