How I Set Up ML Projects That Scale (My Template + Pro Tips)

Lessons from years of deploying successful Data Science projects

Andres Vourakis

Jun 11, 2025

That’s a big misconception!

Coding complexity isn’t a sign of experience or skill.

You know what is?

Structure. Project design. Thinking ahead.

Experienced Data Scientists don’t just write better code, they set up better projects.

Projects that scale, that teammates can jump into, that don’t fall apart when it’s time to ship.

If you want to build something collaborative, maintainable, and most importantly, deployment-ready, you have to think beyond notebooks or complex scripts.

So in this article, I want to show you how to start setting up your projects like an experience Data Scientist, not just to impress but also to make life easier.

The “basics“ can only get you so far…

I think we can all agree that at the very least, a good project structure should look something like this:

But this is only good for working with notebooks, and in the end, a notebook is not the final product, it’s just a thinking space.

So as projects get larger and more complex, especially if you’re working in teams or preparing for deployment, you’ll need to go further.

You’ll need to:

Manage environments more reliably (e.g. with a Dockerfile)
Separate logic into scripts and modules (src/)
Add tests to avoid breaking things (tests/)
Track and version data or models (e.g. using DVC or MLflow)
Use configs to parameterize experiments (config.yaml)
Set up lightweight automation (e.g. Makefile or shell scripts)
Document decisions clearly (expand the README, add docs/)

This is why one of the best tools I’ve found for setting all of this up automatically is the library Cookiecutter.

📣 Need a dataset for your next project?

Yambda-5B is the largest open-source dataset for recommender systems, built from nearly 5B real user interactions. Use it to power your next recommender project, whether you're a student, researcher, or building a new tool.

Download Dataset

Quick overview of Cookiecutter

Cookiecutter lets you generate a full project structure from a template by answering a few command line interface (CLI) prompts.

The version of Cookiecutter I prefer is called Cookiecutter Data Science which is designed for organizing data science projects following best practices.

Installing it and running it is very easy:

pipx install cookiecutter-data-science

Then you just run the command ccds from the parent directory where you want your project:

Preview of Cookiecutter Data Science CLI.

After that, you just follow the instructions to help you create a folder with everything set up—folders, config files, and boilerplate code included.

Great for getting started fast without setting up everything from scratch.

💡 By the way, using this tool doesn’t require a specific language or framework. Cookiecutter is agnostic to your tooling.

I won’t go into more detail since there is lots of great documentation on this website.

Instead, let me give you a few more tips to help you improve your workflow!

🛠️ 3 pro tips for a smoother workflow

Use a .env.example
Always include a .env.example file to make it clear which environment variables are needed. It helps others get set up faster and reduces config errors.
Set up pre-commit hooks
Add pre-commit early to auto-format code and catch simple issues before they make it into version control. It keeps your codebase clean from day one.

Add a simple CLI or Makefile
Whether it’s a train command or make all, having a lightweight way to run key steps saves time and avoids confusion. Make your project runnable without guesswork.

Closing thoughts

I hope by now its clear that you don’t need a massive tech stack to build better ML projects.

What you need is structure, a clear starting point, and a system that lets you stay focused when things get messy (because they will).

Remember, good habits compound!

And project design is one of those small things that quietly makes everything else easier: collaboration, iteration, scaling, even debugging.

Start with structure. Then let good habits do the rest.

A couple of great resources:

💼 Job searching? Applio helps your resume standout and land more interviews.
🤖 Struggling to keep up with AI/ML? Neural Pulse is a 5-minute, human-curated newsletter delivering the best in AI, ML, and data science.
🤝 Want to connect? Let’s connect on LinkedIn, I share lots of career bite-size advice every week.

Thank you for reading! I hope these tips help you build better DS/ML projects

See you next week!

- Andres

Before you go, please hit the like ❤️ button at the bottom of this email to help support me. It truly makes a difference!

To Be a Data Scientist

Discussion about this post