In this market, there is no room for average — that’s just the reality.
If you want your portfolio to help you bridge the experience gap and break into data science, then you’ll have to start by understanding what will make you stand out in the eyes of hiring managers.
They won’t be impressed by your ability to code, they’re searching for problem-solving, creativity, and proof that you can tackle real-world challenges.
And there is a huge difference between projects for learning and projects for showing.
That means that although using overused datasets like Titanic survivors or the NYC Taxi Trip Record dataset may be helpful for learning, including them in your portfolio projects sends the wrong message.
It screams “template project” and doesn’t show off the skills that make you stand out.
If you want to impress, your portfolio needs to demonstrate that you can handle messy, realistic datasets and solve problems that matter.
In this article, I will give you better data alternatives that will help you elevate your portfolio projects and get you one step closer to your dream data science job.
1 — Public APIs
Tapping into APIs allows you to develop projects with real-world applications and showcase key skills like data extraction, and preprocessing.
Here are 5 of my personal favorites:
1. The YouTube API
Analyze trends in video views, comments, and upload patterns. Explore how video metadata like titles, tags, and descriptions contribute to a video’s popularity. The YouTube API is perfect for a project on viral content trends.
2. The SerpAPI
This is a powerful tool for exploring search engine trends like analyzing Google Trends data. You can use it to investigate how search interest evolves over time for specific keywords or to identify seasonal spikes in search activity. The SerpAPI is ideal for projects that blend marketing insights with data analysis.
3. The Unsplash API
Work with beautiful image data by analyzing trends in image searches, usage, or popular photography styles. The Unsplash API is perfect for combining data science with visual storytelling.
4. The Spotify API
Dive into music analytics by exploring song popularity by genre, user-created playlists, or artist performance over time. Using the Spotify API in a project could involve discovering hidden trends in listening habits or building a music recommendation engine.
5. The Reddit API
Use it to analyze discussions and trends across various subreddits. You can explore community engagement, identify emerging topics, or perform sentiment analysis on popular threads. The Reddit API is perfect for text analysis and gaining insights into online behavior.
💡 Remember, these are only suggestions, there are a lot of other great public APIs out there, and knowing how to work with them is a useful skill for data scientists.
2 — Web scraping
Although this option requires a bit more work, it teaches you how to gather unstructured data from websites—a skill that comes in handy in many industries where APIs aren’t available.
Scrape data on products from an e-commerce site to analyze pricing trends.
Gather reviews from a platform like Yelp to identify customer sentiment trends.
Scrape housing listings to explore market trends in specific cities.
💡 If you need some inspiration, here is an example of a portfolio project I worked on a couple of years ago that involved web scraping.
3 — Personal account data
Although the options I mentioned are great, you may not have to look too far. Your own accounts could offer unique and practical data sources to analyze for your next project.
Let me give you some examples:
Netflix: Analyze your own viewing habits and create a dashboard to visualize binge-watching trends. Go here to request your personal usage data.
Strava: Explore workout data and identify patterns in performance improvement. This article explains how to export your own data.
Gmail: Analyze your email patterns to optimize productivity. Go here to export most of your Google personal data.
If you’re serious about landing a data science job, your portfolio needs to go beyond textbook problems. Focus on real-world challenges, messy datasets, and unique insights.
Hiring managers don’t just want to see that you can code—they want to know that you can think like a data scientist.
If you need more, I have written a thorough guide on how to build a competitive data science portfolio 👇
And please, ditch the Titanic dataset. Find real problems to solve and let your portfolio show what you’re truly capable of.
Thank you for reading! In upcoming articles, I will share how to leverage LLMs to create synthetic datasets for your next project.
See you next week!
- Andres
Before you go, please hit the like ❤️ button at the bottom of this email to help support me. It truly makes a difference!
Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.
However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements
Reddit had locked down their API a while ago, is it free now?