How to Build Your Data Science Portfolio, My Startups Mistakes, & More


Hey friends,

Hope you're having a great week so far. Today, I'm excited for 2 reasons:

  • I finally wrote my first newsletter πŸŽ‰
  • I'm thrilled to share this newsletter with you, because it's like my weekly journal where I share 1 tip, 1 mistake, 1 learning, 1 book, and 1 quote, that I've learned from my data science and startup journey.

To give you a bit of my background, I graduated in 2014 as a Physics fresh graduate.

Found my passion in data science, I started my career as a data scientist working from the online gambling industry (story for another time), semiconductor field, to becoming a data science instructor.

Finally, I quit my job in March 2021 and started building my current startup - Staq - the #1 business banking API platform for Southeast Asia.

As I reflect on my journey, I made tons of mistakes, but also learned valuable lessons from the experiences. This newsletter is a letter to my past self, hopefully you'd have some takeaways from it.

Let's get started! πŸš€


What's in the hub today?

  • Tip: How to build a data science portfolio
  • Mistake: I caused the tech debts
  • Learning: Balance between urgency & importance
  • Book: Zero to One
  • Quote: Finding your purpose - the Ikigai way

1 Tip:

⭐️ How to Build a Data Science Portfolio?

People often asked me, "How to build a data science portfolio?"

I recently talked about the 6 steps to build my portfolio if I were to start from zero. The benefits of these 6 steps are:

  • You'll learn how to build an end-to-end data science project.
  • You'll attract the attention of recruiters and employers.
  • You can easily differentiate yourself from the rest.
  • It's easy to land job interviews/offers for DS role.

Here's how to build your data science portfolio, step by step:

​

Step 1: Find a social problem to solve

In the real working environment, most problems are not well-defined. They are vague. Therefore, companies prefer to hire data scientists who have dealt with real world problems before.

If you solve problems from Kaggle, those problems are well-defined, and you can hardly learn how to deal with real world problems. In my opinion, tackling social problems is the best way to build this real working experience.

Here's how to find a social problem to solve:

  • The best social problems can be found around you. Think about what social problems you (or your friends) are facing in your daily life.
For example, say I'm renting a house, and every month I want to forecast my electricity bill in the next month for budgeting purpose.

VoilΓ ! I've found a social problem to solve. It's time to get some data. πŸ˜„

​

Step 2: Get the data using web scraping

Getting data from Kaggle is easy, it's given to you. Unfortunately, in real world, you won't have this luxury. Most of the time, you have to go get data yourself.

In order for me to get my historical electricity bills data, I'd need to do a simple web scraping from my bill account.

Here are the tools that I'd use for web scraping:

​

Step 3: Store the data in database on AWS (free tier)

Once I've scraped the data, I'll output it as JSON file and store it on S3 since AWS provides free tier of S3 data storage up to 5 GB.

Why did I store the data in cloud? Two reasons:

  • I'll need to retrieve data for analysis and ML model training later.
  • Most companies store their data in cloud, so I want to build my skills in cloud computing. Again, that's the whole point of building my portfolio to get real work experience.

​

Step 4: Extract the data, clean & analyse it, get insights

Finally it's time to get the JSON file from S3 for data cleaning and analysis. Here are my typical steps on how to analyse the data:

  1. Data cleaning - Standardise the data from JSON format to dataframe format, remove unwanted data fields etc.
  2. Exploratory Data Analysis (EDA) - Understand the data distribution, identify outliers using boxplot, features engineering.
  3. Data visualisation
  4. Get insights - Spot interesting trends, identify relevant features, remove unwanted features.

​

Step 5: Build a ML model, wrap it into an API to output prediction

After doing all the groundwork, it's time to build a ML model. Once done, I'll deploy the model and wrap it into an API to predict my electricity bills for next month.

Here are the steps I'd take:

  1. ​Build a baseline time series model (i.e. Naive, ARIMA).
  2. ​Build 2-3 different ML models and compare with the baseline model.
  3. Pick the best performing ML model based on the chosen metric (i.e. prediction accuracy)
  4. Deploy the trained ML model in Amazon Sagemaker.​
  5. ​Wrap the ML model into REST API to output prediction.

​

Step 6: Build an end-to-end data & ML pipeline

Once Step 5 is done, I can now automate the full workflow to be performed every month β€” from doing web scraping, data cleaning and analysis, ML training to updating my ML model β€” so that I can get the updated prediction of my electricity bills in the next month.

You can use Amazon EventBridge to trigger your web scraper in lambda function and AWS Step Functions to orchestrate the full workflow (Step 2-5).

​

TL;DR

  1. Find a social problem to solve
  2. Get the data using web scraping
  3. Store the data in database on AWS (free tier)
  4. Extract the data, clean & analyse it, get insights
  5. Build a ML model, wrap it into an API to output prediction
  6. Build an end-to-end data & ML pipeline

​

By using these strategy, you'll be ahead of most aspiring data scientists who only have certificates or Titanic projects under their belts.

As you can see, these 6 steps will take some time before you can build a fully end-to-end data science portfolio - but trust me, it's worth it.

A great portfolio is 10x better than 5 toy projects that don't mirror any real world projects.

1 Mistake:

As a startup founder, given the limited resource, speed of execution is everything. Because of that, I wanted to build things fast during our early stage - so I took shortcuts.

What did I do? I:

  • Did a lot of hard coding, instead of making the code robust.
  • Ignored some minor issues and put them into backlog, instead of fixing them at the beginning.
  • Didn't plan well in architecture design before building, hence making it hard to maintain and scale.

Over time, bugs arised, tech debts compounded. I ended up wasting more time to fix stuff than actually building it. Not good.


1 Learning:

The compounded tech debt was painful when I started paying for the price.

Here is what I've learned:

  • Find the balance between urgency and importance. You can build things the right way, and still be fast enough.
  • Every week, allocate some time to fix issues and reduce tech debts. You can't remove tech debts 100%, but you can reduce it regularly to make your life easier.

1 Book:

​Zero to One: Notes on Startups, or How to Build the Future​

A must-read from Peter Thiel if you want to learn how to build a startup that lasts.

Here are my few takeaways after reading the book:

  • Create new technology and build new things that will make the future not just different, but better β€” to go from 0 to 1.
  • The future won’t happen on its own. Have β€œdefinite optimism” for the future. Make plans and work to make the future better, not wait for it to happen naturally.
  • Every great business is built around a secret that’s hidden from the outside. Find the secret, execute on it, you'll win.

This book has changed how I approach and build Staq with a long term view.

Whenever I'm in doubt, I'll come back for these reminders to make sure we're building the future, not for short term gains.

Have you read this book? What's your thought on it?


1 Quote:

Do what you love.
Do what you're good at.
Do what the world needs.
Do what you can be rewarded for.

From How to Ikigai by Tim Tamashir.

Ikigai is the reason you get out of the bed every morning. It's your purpose.

I was lost when I was in school. I studied Physics, but had no clue what I wanted to do in my life.

These steps helped me find my passion and purpose in data science. Here are 4 questions to help you find your purpose:

  • What do you love?
  • What are you good at?
  • What does the world need?
  • What do you get paid for?

Ask yourself these 4 questions today and let me know how it goes? πŸ’œ


That's all for today

Thanks for reading. I hope you enjoyed today's issue. More than that, I hope it has helped you in some ways and brought you some peace of mind.

You can always write to me by simply replying to this newsletter and we can chat.

See you again next week.

- Admond

​


​

linkedintwitterinstagramfacebookmedium

​

Admond Lee

Hi! Admond here πŸ‘‹πŸ» I am a data scientist currently building a tech startup. Sign up for Hustle Hub - my weekly newsletter where I share actionable data science career tips, mistakes and lessons learned from building a startup - directly to your inbox.

Read more from Admond Lee
time lapse photography of man jumping on waterfalls

Hey friends, Last week I shared my story of going from physics to data science. Well... That's just Part 1 of the story. In today's issue, I want to share Part 2 of the story on how I ended up quitting my job (after working for 3 years in data science) to build a startup - Staq. P.S. You'll be surprised how I ended up building a startup. πŸ˜‚ My 1st Job - Research Engineer at Titansoft Spot me in the picture πŸ˜‰ In June 2018, I started my first full-time job at Titansoft as a research engineer. It...

man wearing gray T-shirt standing on forest

Hey friends, Having been asked by a number of people why I decided to transition from physics into data science, and eventually quit my job to build a startup, I'd love to share my story with you today to hopefully encourage you to keep exploring, and most importantly, inspire you to pursue your passion. You can't connect the dots looking forward, you can only connect them looking backwards. β€” Steve Jobs The truth is that I didn't know I wanted to become a data scientist when I was studying...

person about to lift barbell

Hey friends, Hope you're having a great week. After much preparation, we finally launched Staq in Entrepreneur First Global Reveal last week. Spot us (Staq) HERE to watch our 2-min pitch with my cofounder! πŸ˜‚ It was also my first time meeting many top VCs and investors at an exclusive meetup organised by EF. It was an epic night when we shared what we've been building at Staq. Excited, we even took a picture together! Taking a picture with investors As promised, I'll document my fundraising...