My main research interests lie in natural language processing (NLP), particularly applied NLP. My current research focuses on text simplification, which aims to reduce the complexity of text while maintaining the content.

Text simplification data sets

Python Sea and Space Images

Resources for the paper "A Course-long Information Retrieval Project"

We built a search engine called "Bursti" in Fall 09 in the information retrieval course (cs160). Check it out. To find out more about how it works see the white papers.

Some fun in the Fall09 intro course studying strings. Less enthused 10 min. later :)

Older projects

Winter 2005: Statistical Machine Translation Tutorial Resources

Useful software, data, etc


Partha Mukherjeea, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y. Romero Diaz, Nicole P. Yuan, Gail Pritchard, Sonia Colina (2017). NegAIT: A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. In Journal of Biomedical Informatics.

David Kauchak, Gondy Leroy and Melissa Just (2016). Grammar Frequency and Simplification: When Intuition Fails. In American Medical Infomatics Association (AMIA) Fall Symposium (poster paper). Distinguished Poster Award.

David Kauchak and Gondy Leroy (2016). Moving Beyond Readability Metrics for Health-Related Text Simplification. IEEE IT Professional.

David Kauchak (2016). Pomona at SemEval-2016 Task 11: Predicting Word Complexity Based on Corpus Frequency. In Proceedings of International Workshop on Semantic Evaluation (SemEval)..

Gondy Leroy, David Kauchak and Alan Hogue (2016). Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases. Journal of Health Communication.

Colby Horn, Katie Manduca and David Kauchak (2014). Learning a Lexical Simplifier Using Wikipedia. In Proceedings of ACL (short paper).

David Kauchak, Obay Mouradi, Christopher Pentoney and Gondy Leroy (2014). Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy and David Kauchak (2013). The Effect of Word Familiarity on Actual and Perceived Text Difficulty. In Journal of American Medical Informatics Association.

Daniel Feblowitz and David Kauchak (2013). Sentence Simplification as Tree Transduction. In Proceedings of PITR (ACL Workshop).

David Kauchak (2013). Improving Text Simplification Language Modeling Using Unsimplified Text Data. In Proceedings of ACL. Associated data.

Gondy Leroy, James E. Endicott, David Kauchak, Obay Mouradi and Melissa Just (2013). User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention. In Journal of Medical Internet Research (JMIR).

Gondy Leroy, David Kauchak and Obay Mouradi (2013). A User-study Measuring the Effects of Lexical Simplification and Coherence Enhancement on Perceived and Actual Text Difficulty. In International Journal of Medical Informatics (IJMI).

Obay Mouradi, Gondy Leroy, David Kauchak and James E. Endicott (2013). Influence of Text and Participant Characteristics on Perceived and Actual Text Difficulty. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy, James Endicott, Obay Mouradi, David Kauchak and Melissa Just (2012). Improving Perceived and Actual Text Difficulty for Health Information Consumers using Semi-Automated Methods. In American Medical Infomatics Association (AMIA) Fall Symposium.

David Kauchak, Gondy Leroy and William Coster (2012). A Systematic Grammatical Analysis of Easy and Difficult Medical Text. In American Medical Infomatics Association (AMIA) Fall Symposium (poster paper).

William Coster and David Kauchak (2011). Learning to Simplify Sentences Using Wikipedia. In Proceedings of Text-To-Text Generation, ACL Workshop.

William Coster and David Kauchak (2011). Simple English Wikipedia: A New Text Simplification Task. In Proceedings of ACL (short paper). Associated data.

Guillermo Gomez-Hicks and David Kauchak (2011). Dynamic game difficulty balancing for backgammon. In Proceedings of ACM SouthEast.

David Kauchak (2010). A Course-long Information Retrieval Project. In Proceedings of Symposium on Educational Advances in Artificial Intelligence (EAAI). Associated resources available.

David Kauchak (2006). Contribution to Research on Machine Translation. Doctoral dissertation, University of California, San Diego.

David Kauchak and Regina Barzilay (2006). Paraphrasing for Automatic Evaluation. In Proceedings of HLT-NAACL.

Rasmus E. Madsen, David Kauchak and Charles Elkan (2005). Modeling Word Burstiness Using the Dirichlet Distribution. In Proceedings of the International Conference on Machine Learning (ICML'05).

David Kauchak and Francine Chen (2005). Feature-Based Segmentation of Narrative Documents. In Proc. of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, 32-39.

David Kauchak, Joseph Smarr and Charles Elkan (2004). Sources of Success for Boosted Wrapper Induction. In Journal of Machine Learning Research, 5, 499 - 527.

David Kauchak and Sanjoy Dasgupta (2003). An Iterative Improvement Procedure for Hierarchical Clustering. In Advances in Neural Information Processing Systems (NIPS).

David Kauchak and Charles Elkan (2003). Learning Rules to Improve a Machine Translation System. In Proceedings of the European Conference on Machine Learning (ECML).