Skip to main content
Tobin Pre-Doctoral Fellowship

Investigating Social Inequality Using Big Data and Machine Learning Methods



Song Ma



Social inequality is a key issue that faces our society. What factors give rise to and perpetuate social inequality, and how can we reduce such inequality? This issue attracts researchers from different fields such as economics, sociology, psychology, and political science, as well as policymakers and practitioners from different backgrounds who aim to improve social equality and welfare. Despite much progress made in recent research, tracing these factors is still difficult---they often influence various dimensions of everyday life in subtle ways, presenting challenges to data collection, measurement, and analysis, which in turn impose limits on research.

This proposed research plans to empirically investigate the source of social inequality and explore potential solutions by collecting and using novel big data and by conducting field experiments, all assisted by machine learning (ML) and AI technologies. The research will focus on four areas: (i) inequality in higher education and access to skills and knowledge (using large-scale data on university student records, course offering catalog, and syllabus texts), (ii) inequality in financial markets (using financial advertisement video data and video processing techniques), (iii) inequality in innovation and startup opportunities (combining detailed startup funding application documents and their pitch recordings), and (iv) technology (ML/AI)-based interventions that may improve equality in the labor market. In exploring these dimensions, my collaborators, across different universities and fields, and I will examine different forms of social inequality such as gender inequality, racial inequality, and social mobility across different income groups.

The research program's main methodological innovation is to build and share new big data sets and to develop new big data and ML-based methods for other researchers. To do so, we will collect data sets of texts, images, and videos and will develop a comprehensive set of empirical tools. The proposed research is also embedded in an education plan that mentors young researchers (including predoctoral research associates, Ph.D. students), disseminates findings to policymakers, and engages practitioners interested in using the research findings to change real-world practices and to improve social equality.



Candidates should have quantitative and coding skills, especially experience in general purpose languages like Python and Julia and statistical languages like Stata or R.

  • Candidates are expected to perform large scale textual analysis, so exposure to related methodologies is a big plus.
  • Exposure/knowledge of Machine Learning is a plus, but not required. The candidate should be open to learn such skills over the research period.
  • The work will primarily use LINUX, so exposure of Linux and Cluster server computing is strongly recommended, but training can be provided if needed
  • Preference will be given to detail-oriented applicants

Candidates need not be economics majors, though they should have experience with economics. We welcome applicants from other fields such as, but not limited to, computer science, engineering, mathematics, political science, psychology, and statistics. A love of working with data—cleaning it, understanding it, and presenting it in enlightening ways—is essential for this position.