PhD => Data Scientist

A few great posts about transitioning from academia to working in tech are

I finished my math PhD in May 2013, did a fantastic program called Insight Data Science in June and July, interviewed for a while, and started my current position at a small startup in November. Below are my own tips about making the transition from PhD to data scientist for those in fields like (pure) Math and theoretical Physics who suspect that data science is probably something they’d be good at, but, like me, did no programming, data analysis, or machine learning as part of their research. Figuring out how to sell yourself is very important, but articulating how your problem-solving and research skills are relevant in the large will only go so far: you have to amass some directly relevant skills and knowledge. Here are the things I think are most important to do and learn.

  • Programming: learn Python. It’s easy to get started, everyone knows it, and it will do pretty much anything you want: build a web app (flask), interact with databases (mysqldb), data analysis (pandas and numpy), machine learning (scikit-learn), or make a gif. Pandas is R for Python, so don’t worry about learning R, too. Try Google, MIT, Coursera.
  • Databases: learn MySQL. It is the lingua franca of databases. SQL was designed to query relational databases and even though nonrelational databases are the hippest-latest-freshest, it is the way that everyone knows how to query data: the first thing everyone wants to do with a new way of storing data is find a way to query it with SQL.
  • Projects: do some small ones. Take a look herehere, or here and try building a spam filter or a recommendation engine, for example. There are lots of public data sets, but even better if you can figure out how  to gather your own by, say, querying an API (requests) or scraping the web (Beautiful Soup).
  • Read: see what’s going on. Check out Hacker News and Data Tau, mathbabe, the engineering blogs of tech companies, follow people/companies on Twitter, read research papers. Even simple things like counting get tricky with huge volumes of data so you won’t get bored. Check out: reservoir sampling, mapreduce and mutual friendsprobabilistic counting, bloom filters.
  • Interact: go to MeetUps. There are a lot. It’s easy to meet new people, learn new things, and drink free beer. Email discussion groups are another a great way to learn new ideas and ask questions. Take advantage of the fact that there are more than a handful of other people in the world who understand what you are working on!
  • The Basics: brush up on and/or learn basic probability & statistics and algorithms & data structures.

In the future, I’ll write about things I learned interviewing and why the position I chose is perfect for me. For now, here are my tips on where to start. This process can be daunting, but it’s also a lot of fun! Enjoy!