Sign in

In my opinion, it is relatively straightforward to simulate the situation where expanded targeting is enabled, and then directly surface the simulated impression gain to advertisers. So I am not sure why a ML model is needed here

Thoughts and Theory

Photo by on Unsplash

“Do I need a Ph.D. degree to be a Data Scientist”? This is a common question asked by many interested in joining the field. Quite a few blogs elaborate on reasons why you do not need a Ph.D. degree to become a Data Scientist (e.g. here is one) and I think those generally make sense. Still, there is something extra Ph.D. graduates develop during their multi-year research experience: the research mindsets. Since I cannot easily find a formal definition elsewhere, I will simply make it up:

The research mindsets are the thinking patterns or methods commonly used by researchers, to…

Learning-by-doing: a simple simulation can help with better understand complicated ideas like multithreading and multiprocessing.

Photo by hue12 photography on Unsplash

Python is a great general-purpose language with applications in various fields. However, sometimes you just hope it can speed up further. One way to improve the speed is to parallel the works, with either multithreading or multiprocessing.

There are numerous great resources out there that illustrate the concepts of both. To not duplicate the efforts, here are a few I found very helpful.

In this article, I want to provide a simple simulation for anyone who wants to explore the concepts further and test it out on their own laptop. So here we go!

Simulation setup

At a…

My thoughts on how data scientists solve problems, along with sharing a case study using one favorite project in my first job

Photo by Volodymyr Hryshchenko on Unsplash

There are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scientists always try leveraging the most advanced algorithms, the fancier model equals a better solution. While these are not fully groundless, they represent two common misunderstandings on how data scientists work: one emphasizes too much on the “execution” side, and the other overstate the “algorithm” part.

Obviously, these myths are not how we actually solve problems. …

My son helps me to notice an unconscious egocentric bias and to become better at work

One day, I opened a new toy for my soon-to-be-3-year-old son. It was a ready-to-assemble airplane: with available tools like a screwdriver, bolts, and nuts. Once kids install the wings and propeller, by themselves, they can make it an airplane!

My son’s airplane toy

It was (guilty) fun to watch how my son started the “installation” and tried to spin the screwdrivers around randomly to install the wings. After a short time, he turned over for help: obviously it didn’t work and he started realizing that!

“Rotate it…

Sharing my thoughts on innovation, along with one favorite Data Science project in 2014

Photo by AbsolutVision on Unsplash

Millions of innovations happen in the world every day. Innovations create new products, services, business models, technologies, and even new scientific fields; if there were no innovation, we would be living in a totally different and boring place. Innovation is also critical in the Data Science field: Data scientists transform data into actionable products/insights, and such transformation constantly requires one to innovate beyond the status quo.

Innovation is production or adoption, assimilation, and exploitation of a value-added novelty in economic and social spheres; renewal and enlargement of products, services, and markets; development of new methods of production; and the establishment…

Using Requests and BeautifulSoup, I extracted the historic New York Times Best Sellers (Business topic) lists to enrich my reading list

New Year Resolution: 12 books in 2020

I concede that I didn’t read too many books in the past a few years. With MOOC (Massive Open Online Courses) evolution started in 2012, I spent most of my spare time across Coursera, edX, and LinkedIn Learning. Many contents on MOOC illustrate knowledge (e.g. programming techniques, statistics concepts, etc) very well, these greatly help with my early career growth; however, I always feel something is missing with using only MOOC, but I could not tell what it is.

Today, at the end of 2019, after finish reading “Drive: The Surprising Truth About What Motivates Us” and started enjoying “The…

The solo tree in Tongva Park, Santa Monica, CA (I took the photo during a lunch break back in 2014)

Git is a very popular version control system for tracking changes in computer files and coordinating work on those files among multiple people. It is well used in Data Science projects to keep track of code and maintain parallel development. Git can be used in a very complicated way, however, for Data Scientist, we can keep it simple. In this post, I am going to walk through the main use cases if you are a “Solo Master”.

What is “Solo Master”?

​When you use GIT simply to keep your code safe, to avoid going crazy after your laptop is broken/stolen, all changes are on…

Why am I writing the post

As one who is in the Data Science field for a while, I received quite a few questions from enthusiastic graduating students asking “how to become a data scientist” or “what do data scientists do on a daily basis”. On the other side, I occasionally hear some voices joking the statement “Data Scientist is the Sexiest Job in the 21st Century” is not true anymore: most of the data scientists time could be spent on cumbersome tasks like run SQL to pull data, building dashboards to do reporting, and (more frustrating) you need to maintain them! They ask what is…

Pan Wu

Data Science Manager @ Facebook

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store