Conference site ยป Proceedings

Tell Me Something I Don't Know: Analyzing OkCupid Profiles

Juan Shishido
School of Information, University of California, Berkeley

Jaya Narasimhan
Department of Electrical Engineering and Computer Science, University of California, Berkeley

Matar Haller
Helen Wills Neuroscience Institute, University of California, Berkeley



In this paper, we present an analysis of 59,000 OkCupid user profiles that examines online self-presentation by combining natural language processing (NLP) with machine learning. We analyze word usage patterns by self-reported sex and drug usage status. In doing so, we review standard NLP techniques, cover several ways to represent text data, and explain topic modeling. We find that individuals in particular demographic groups self-present in consistent ways. Our results also suggest that users may unintentionally reveal demographic attributes in their online profiles.


natural language processing, machine learning, supervised learning, unsupervised learning, topic modeling, okcupid, online dating



Bibtex entry

Full text PDF