My main research interests lie in natural language processing (NLP), particularly applied NLP. My current research focuses on text simplification, which aims to reduce the complexity of text while maintaining the content.
Text simplification data sets
David Kauchak, Gondy Leroy and Alan Hogue (2017). Measuring Text Difficulty Using Parse-Tree Frequency. In Journal of the Association for Information Science and Technology.
Partha Mukherjeea, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y. Romero Diaz, Nicole P. Yuan, Gail Pritchard, Sonia Colina (2017). NegAIT: A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. In Journal of Biomedical Informatics.
Yang Gu, Gondy Leroy and David Kauchak (2017). When Synonyms Are Not Enough: Optimal Parenthetical Insertion for Text Simplification. In American Medical Informatics Association (AMIA) Fall Symposium.
Partha Mukherjee, Gondy Leroy, David Kauchak, Brianda A. Navarrete, Damien Y. Diaz, and Sonia Colina (2017). The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study. In American Medical Informatics Association (AMIA) Fall Symposium.
Gondy Leroy, Brianda A. Navarrete, Sonia Colina and David Kauchak (2017). Spanish Text Simplification Using Term Familiarity: Applying Principles from English Text Simplification. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper).
Debra Revere, Partha Mukherjee, David Kauchak and and Gondy Leroy (2017). Creating a Corpus Resource for Text Simplification Research and Development. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper).
David Kauchak, Gondy Leroy and Melissa Just (2016). Grammar Frequency and Simplification: When Intuition Fails. In American Medical Informatics Association (AMIA) Fall Symposium (poster paper). Distinguished Poster Award.
David Kauchak and Gondy Leroy (2016). Moving Beyond Readability Metrics for Health-Related Text Simplification. IEEE IT Professional.
David Kauchak (2016). Pomona at SemEval-2016 Task 11: Predicting Word Complexity Based on Corpus Frequency. In Proceedings of International Workshop on Semantic Evaluation (SemEval)..
Gondy Leroy, David Kauchak and Alan Hogue (2016). Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases. Journal of Health Communication.
Colby Horn, Katie Manduca and David Kauchak (2014). Learning a Lexical Simplifier Using Wikipedia. In Proceedings of ACL (short paper).
David Kauchak, Obay Mouradi, Christopher Pentoney and Gondy Leroy (2014). Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In Hawaii International Conference on System Sciences (HICSS).
Gondy Leroy and David Kauchak (2013). The Effect of Word Familiarity on Actual and Perceived Text Difficulty. In Journal of American Medical Informatics Association.
Daniel Feblowitz and David Kauchak (2013). Sentence Simplification as Tree Transduction. In Proceedings of PITR (ACL Workshop).
David Kauchak (2013). Improving Text Simplification Language Modeling Using Unsimplified Text Data. In Proceedings of ACL. Associated data.
Gondy Leroy, James E. Endicott, David Kauchak, Obay Mouradi and Melissa Just (2013). User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention. In Journal of Medical Internet Research (JMIR).
Gondy Leroy, David Kauchak and Obay Mouradi (2013). A User-study Measuring the Effects of Lexical Simplification and Coherence Enhancement on Perceived and Actual Text Difficulty. In International Journal of Medical Informatics (IJMI).
Obay Mouradi, Gondy Leroy, David Kauchak and James E. Endicott (2013). Influence of Text and Participant Characteristics on Perceived and Actual Text Difficulty. In Hawaii International Conference on System Sciences (HICSS).
Gondy Leroy, James Endicott, Obay Mouradi, David Kauchak and Melissa Just (2012). Improving Perceived and Actual Text Difficulty for Health Information Consumers using Semi-Automated Methods. In American Medical Infomatics Association (AMIA) Fall Symposium.
David Kauchak, Gondy Leroy and William Coster (2012). A Systematic Grammatical Analysis of Easy and Difficult Medical Text. In American Medical Infomatics Association (AMIA) Fall Symposium (poster paper).
William Coster and David Kauchak (2011). Learning to Simplify Sentences Using Wikipedia. In Proceedings of Text-To-Text Generation, ACL Workshop.
William Coster and David Kauchak (2011). Simple English Wikipedia: A New Text Simplification Task. In Proceedings of ACL (short paper). Associated data.
Guillermo Gomez-Hicks and David Kauchak (2011). Dynamic game difficulty balancing for backgammon. In Proceedings of ACM SouthEast.
David Kauchak (2010). A Course-long Information Retrieval Project. In Proceedings of Symposium on Educational Advances in Artificial Intelligence (EAAI). Associated resources available.
David Kauchak (2006). Contribution to Research on Machine Translation. Doctoral dissertation, University of California, San Diego.
David Kauchak and Regina Barzilay (2006). Paraphrasing for Automatic Evaluation. In Proceedings of HLT-NAACL.
Rasmus E. Madsen, David Kauchak and Charles Elkan (2005). Modeling Word Burstiness Using the Dirichlet Distribution. In Proceedings of the International Conference on Machine Learning (ICML'05).
David Kauchak and Francine Chen (2005). Feature-Based Segmentation of Narrative Documents. In Proc. of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, 32-39.
David Kauchak, Joseph Smarr and Charles Elkan (2004). Sources of Success for Boosted Wrapper Induction. In Journal of Machine Learning Research, 5, 499 - 527.
David Kauchak and Sanjoy Dasgupta (2003). An Iterative Improvement Procedure for Hierarchical Clustering. In Advances in Neural Information Processing Systems (NIPS).
David Kauchak and Charles Elkan (2003). Learning Rules to Improve a Machine Translation System. In Proceedings of the European Conference on Machine Learning (ECML).