David Kauchak

Assistant Professor
Computer Science Department
Pomona College
224 Edmunds
Claremont, CA 91711
(909) 607-0473

dkauchakcs pomona edu

Fall office hours:
  Mon. 10:30am-12
  Thu. 11am-12
  Fri. 1:30-3:00pm
  and by appointment

My current schedule

Fall 2014 courses:
  CS 190 - Natural Language Processing
  CS 190 - CS Senior Seminar (section 1, section 3)

Older teaching:


My main research interests lie in natural language processing (NLP), particularly applied NLP. My current research focuses on text simplification, which aims to reduce the complexity of text while maintaining the content.

Text simplification data sets


Python Sea and Space Images

Resources for the paper "A Course-long Information Retrieval Project"

We built a search engine called "Bursti" in Fall 09 in the information retrieval course (cs160). Check it out. To find out more about how it works see the white papers.

Some fun in the Fall09 intro course studying strings. Less enthused 10 min. later :)

Older projects

Winter 2005: Statistical Machine Translation Tutorial Resources

Useful software, data, etc

Publications

Colby Horn, Katie Manduca and David Kauchak (2014). Learning a Lexical Simplifier Using Wikipedia. In Proceedings of ACL (short paper).

David Kauchak, Obay Mouradi, Christopher Pentoney and Gondy Leroy (2014). Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy and David Kauchak (2013). The Effect of Word Familiarity on Actual and Perceived Text Difficulty. In Journal of American Medical Informatics Association.

Daniel Feblowitz and David Kauchak (2013). Sentence Simplification as Tree Transduction. In Proceedings of PITR (ACL Workshop).

David Kauchak (2013). Improving Text Simplification Language Modeling Using Unsimplified Text Data. In Proceedings of ACL. Associated data.

Gondy Leroy, James E. Endicott, David Kauchak, Obay Mouradi and Melissa Just (2013). User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention. In Journal of Medical Internet Research (JMIR).

Gondy Leroy, David Kauchak and Obay Mouradi (2013). A User-study Measuring the Effects of Lexical Simplification and Coherence Enhancement on Perceived and Actual Text Difficulty. In International Journal of Medical Informatics (IJMI).

Obay Mouradi, Gondy Leroy, David Kauchak and James E. Endicott (2013). Influence of Text and Participant Characteristics on Perceived and Actual Text Difficulty. In Hawaii International Conference on System Sciences (HICSS).

Gondy Leroy, James Endicott, Obay Mouradi, David Kauchak and Melissa Just (2012). Improving Perceived and Actual Text Difficulty for Health Information Consumers using Semi-Automated Methods. In American Medical Infomatics Association (AMIA) Fall Symposium.

David Kauchak, Gondy Leroy and William Coster (2012). A Systematic Grammatical Analysis of Easy and Difficult Medical Text. In American Medical Infomatics Association (AMIA) Fall Symposium (poster paper).

William Coster and David Kauchak (2011). Learning to Simplify Sentences Using Wikipedia. In Proceedings of Text-To-Text Generation, ACL Workshop.

William Coster and David Kauchak (2011). Simple English Wikipedia: A New Text Simplification Task. In Proceedings of ACL (short paper). Associated data.

Guillermo Gomez-Hicks and David Kauchak (2011). Dynamic game difficulty balancing for backgammon. In Proceedings of ACM SouthEast.

David Kauchak (2010). A Course-long Information Retrieval Project. In Proceedings of Symposium on Educational Advances in Artificial Intelligence (EAAI). Associated resources available.

David Kauchak (2006). Contribution to Research on Machine Translation. Doctoral dissertation, University of California, San Diego.

David Kauchak and Regina Barzilay (2006). Paraphrasing for Automatic Evaluation. In Proceedings of HLT-NAACL.

Rasmus E. Madsen, David Kauchak and Charles Elkan (2005). Modeling Word Burstiness Using the Dirichlet Distribution. In Proceedings of the Twenty-Second International Conference on Machine Learning (ICML'05).

David Kauchak and Francine Chen (2005). Feature-Based Segmentation of Narrative Documents. In Proc. of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, 32-39.

David Kauchak, Joseph Smarr and Charles Elkan (2004). Sources of Success for Boosted Wrapper Induction. In Journal of Machine Learning Research, 5, 499 - 527.

David Kauchak and Sanjoy Dasgupta (2003). An Iterative Improvement Procedure for Hierarchical Clustering. In Advances in Neural Information Processing Systems (NIPS).

David Kauchak and Charles Elkan (2003). Learning Rules to Improve a Machine Translation System . In Proceedings of the European Conference on Machine Learning (ECML).

David Kauchak, Joseph Smarr and Charles Elkan (2002). Sources of Success for Information Extraction Methods. Technical Report No. CS2002-0696, UCSD.