5 Comments
User's avatar
Preethi Subramanian's avatar

Hi Andres, thanks for sharing. I'm curious how test scripts can be added for data science projects. Could you give some examples?

Expand full comment
Andres Vourakis's avatar

Great question! I usually add test scripts for the parts of the pipeline that are most likely to break silently, like data validation, feature engineering, and model output checks.

For example:

- Unit tests for data cleaning and feature functions (e.g. does create_days_since_signup() return the correct values?)

- And if I deploy a model, I add sanity checks on the outputs (e.g., flagging if too many predictions are the same)

Hope that helps!

Expand full comment
Preethi Subramanian's avatar

Thank you for the insights. I haven't worked with test scripts for data science projects. Do you have any suggestions or any references that I can refer to if I want to work on this? I would love to know more about this!

Expand full comment
Simran Anand's avatar

Amazing 👏

Expand full comment
Varun Sagar Theegala's avatar

Thanks for sharing library reference Andres .As I recently took up a new data role that involves building analytical solutions for deployment, I learned the importance and benefits of building structured project folders. This is definitely going to a helpful reference point

Expand full comment