Some thoughts on Data Science

Disclaimer: I am an average Data Scientist. The following are just my opinions. It’s not meant to be exhaustive but rather thoughtful.

Structured thinking

In my opinion, this is the most important skill all Data Scientists need. It is the ability to take a problem and arrive at a solution in a methodical and constructive way. I think the best example of structured thinking is seen when you are writing a mathematical proof. Or doing system design or architecture. To arrive at the solution you need to build on each step. In practice, there are often multiple solutions, and initial states can vary, along with advantages and disadvantages for each solution path you take. So this is often not about the solution itself, but the process. It’s definitely something that can be practiced and improved upon. Experience helps when you are able to pattern match past and present problems. The experience doesn’t have to be domain specific as you can find structural similarities between problems across domains.

Linear Regression

Take the time to understand linear regression as your life depends on it. Understand how different characteristics of the data affect a regression model. Is it better to have low or high variance among your independent variables? What does homoscedasticity even mean? What does it mean if the error term is not normally distributed?

My point is, go beyond the concept of fitting a straight line through some data. I like causal inference so I will cite a causal inference based regression write up. Once you understand regression like a pro, you can think about non-linear problems, which by the way are most problems, and how decision trees work. 

Causal Inference

There are two branches of Data Science: Machine Learning and Causal Inference. But if you think about the canonical structures of the problems they both involve solving for some E[Y | X]. Causal inference goes a step above where it is interested in E[Y | T_1] — E[Y | T_0]. In linear regression, the interpretation of the coefficients is an attempt at the second equation. And there are lots of other similarities between the two branches: Type 1 and Type 2 errors, False Positives and False Negatives for instance. I think every Data Scientist should think about causality and how approaches to establishing causality fall short in doing so truly and then how to guide a team given those shortfalls.

Within Causal Inference, you also have observational inference problems. The main distinction here is that in observational problems you don’t have control over data generation. You cannot control who gets treatment. But this is a topic for another day. I like this branch so much that I wanted to call it out.

Hypothesis Testing

I just don’t mean here understanding p-values. I mean conceptualizing the framework of hypothesis testing. You want to prove a theory, and you start with the null hypothesis that this theory is false. You collect some data and then you want to see if you can reject that null hypothesis. Is there a threshold here? How does this relate to Type 1 and Type 2 errors? What does it mean for the business? Why can’t you ever accept the null hypothesis? I think it’s worth spending some time thinking about these. 

People

I can’t think of a Data Science problem which I couldn’t think of a solution for. Weird flex, I know. But with people, I have had to navigate extremely uncertain risk reward scenarios. I have many thoughts on this. But I would just like to highlight this one.

In my opinion, the core requirement in working with people is that of balancing humility and confidence. There is absolutely no place for ego if you are an IC. Leaders can get away with it. Confidence comes from realizing the fact that you are an expert in a certain field within a cross-functional team. And humility comes from realizing that each and every one else is an expert in their respective field. But here is the key: you can’t lean towards either too much. Too much confidence and people won’t enjoy working with you and you will burn higher social capital when you make a mistake. Too much humility and you might be overwhelmed by stakeholder requests. Objective grounding on your competence is very important, and so is creating space for subjective judgement.

This is all I have for now. If I think of more as I grow, I will update it.

Previous
Previous

📌 Travel Journal

Next
Next

Confidence and humility