Methodology & Limitations

This page describes how ConductSpeech computes metrics, where normative data comes from, and what the known limitations are. Intended for clinicians and researchers evaluating the tool.

Morpheme Counting (English)

ConductSpeech counts morphemes following standard clinical conventions as described in Brown (1973):

Inflectional suffixes (-ed, -ing, -s plural, -s possessive, -er comparative, -est superlative) each count as one morpheme.
Irregular past tense forms (went, ran, ate) count as one morpheme.
Catenatives (gonna, wanna, gotta) count as one morpheme.
Compounds (breakfast, playground) count as one morpheme.
Diminutives (doggie, birdie) count as one morpheme.
Fillers and mazes (um, uh, like) are excluded from morpheme counts.

Morpheme segmentation uses rule-based word and suffix handling. It is not a probabilistic model.

Spanish Morpheme Counting

The Spanish analyzer handles conjugation-rich morphology: verbal endings (present, preterite, imperfect, future, conditional, subjunctive), diminutives, pronominal clitics, plural and gender markers.

Limitation: This is a rule-based approximation, not a full Spanish morphological parser. It handles general Spanish conventions but does not make dialect-specific claims. Validation checks show agreement within approximately 0.15 MLU of hand-coded samples, but this has not been independently validated at scale.

Core Metrics

MLU (morphemes & words)

Total morphemes (or words) divided by total child utterances. Follows Brown's Stages conventions (Brown, 1973).

TTR (Type-Token Ratio)

Unique word types divided by total word tokens. A measure of lexical diversity, sensitive to sample length.

NDW (Number of Different Words)

Count of unique words across the sample. Less sensitive to sample length than TTR. Norms from Leadholm & Miller (1992).

PGU (Percent Grammatical Utterances)

Grammatically correct utterances divided by total utterances, times 100. Grammaticality is assessed using rule-based error detection for 11+ pattern categories (SVA, tense marking, copula omission, pronoun case, article omission, etc.).

Advanced Metrics

DSS (Lee, 1974) and IPSyn (Scarborough, 1990) are computed as lightweight rule-based approximations (“DSS-lite” and “IPSyn-lite”). They capture the major scoring categories but may differ from hand-scored reference implementations.

Limitation: For research-grade DSS and IPSyn accuracy, we recommend validating ConductSpeech output against hand-coded samples from your population. These metrics are best used for screening and progress monitoring, not as sole diagnostic criteria.

Normative Comparison

Age-stratified normative bands for MLU are derived from:

English: Brown's Stages (Brown, 1973) and Miller & Chapman (1981) age-MLU expectations for children 18-144 months.
Spanish: Romero CHILDES Spanish reference set (Romero et al., 2023), open-access and peer-reviewed.

Z-score estimates are computed against these reference ranges. The norms are population-level and should be interpreted in context — they do not account for dialect variation, bilingual development patterns, or individual circumstances.

Narrative Scoring

ConductSpeech supports three narrative rubrics: NSS (Heilmann et al., 2010), ESS (Heilmann & Malone, 2014), and PSS (Nippold et al., 2005). Scoring uses AI-assisted review when available, with rule-based scoring as a backup.

Limitation: AI-scored narratives should be reviewed by a qualified clinician. The AI model may miss cultural context, dialectal narrative conventions, or subtle pragmatic elements. Story grammar element detection is heuristic-based and may not capture all story components in non-standard narratives.

Grammatical Error Detection

Error patterns are detected using rule-based heuristics for 11+ categories: subject-verb agreement, past tense marking, copula omission, auxiliary omission, pronoun case, article omission, plural marking, possessive marking, third-person singular, and others. These rules target common developmental error patterns in English.

Limitation:Rule-based error detection has both false positives and false negatives. It may flag dialectal features (e.g., African American English, Southern American English) as errors. Clinicians should interpret error counts in the context of the child's linguistic background. Sensitivity and specificity have not been formally evaluated against a gold-standard reference set.

AI-Generated Reports

Clinical narrative reports and IEP goal suggestions are generated from computed metrics and anonymized transcript data. Raw audio and patient identifiers are not needed for draft report generation.

Limitation: AI-generated reports are drafts intended to save clinician time, not replace clinical judgment. All reports should be reviewed and edited by a qualified SLP before inclusion in official documentation (IEPs, medical records, evaluation reports).

Overall Limitations

ConductSpeech has not been independently validated in a peer-reviewed study. Clinical validation with gold-standard comparisons is planned.
Transcription accuracy depends on audio quality. Background noise, overlapping speakers, and low microphone volume degrade results.
Speaker diarization (child vs. examiner) uses heuristics without Deepgram and may misattribute utterances.
The tool is designed for English and Spanish. Other languages are not currently supported.
Normative data is not available for all age ranges or all populations.
ConductSpeech is a clinical support tool, not a diagnostic instrument. It should be used alongside professional clinical judgment.

References

Brown, R. (1973). A First Language: The Early Stages. Harvard University Press.
Miller, J. F., & Chapman, R. S. (1981). The relation between age and mean length of utterance in morphemes. Journal of Speech and Hearing Research, 24(2), 154–161.
Leadholm, B. J., & Miller, J. F. (1992). Language Sample Analysis: The Wisconsin Guide. Wisconsin Department of Public Instruction.
Lee, L. L. (1974). Developmental Sentence Analysis. Northwestern University Press.
Scarborough, H. S. (1990). Index of Productive Syntax. Applied Psycholinguistics, 11(1), 1–22.
Heilmann, J., Miller, J. F., Nockerts, A., & Dunaway, C. (2010). Properties of the Narrative Scoring Scheme. American Journal of Speech-Language Pathology, 19(2), 154–166.
Heilmann, J., & Malone, T. O. (2014). The rules of the game: Properties of a database of expository language samples. Language, Speech, and Hearing Services in Schools, 45(4), 277–290.
Nippold, M. A., Hesketh, L. J., Duthie, J. K., & Mansfield, T. C. (2005). Conversational versus expository discourse: A study of syntactic development in children, adolescents, and adults. Journal of Speech, Language, and Hearing Research, 48(5), 1048–1064.
Romero, S., et al. (2023). CHILDES Spanish Corpus. TalkBank/CHILDES. CC-BY.