Great question! I usually add test scripts for the parts of the pipeline that are most likely to break silently, like data validation, feature engineering, and model output checks.
For example:
- Unit tests for data cleaning and feature functions (e.g. does create_days_since_signup() return the correct values?)
- And if I deploy a model, I add sanity checks on the outputs (e.g., flagging if too many predictions are the same)
Thank you for the insights. I haven't worked with test scripts for data science projects. Do you have any suggestions or any references that I can refer to if I want to work on this? I would love to know more about this!
Thanks for sharing library reference Andres .As I recently took up a new data role that involves building analytical solutions for deployment, I learned the importance and benefits of building structured project folders. This is definitely going to a helpful reference point
Hi Andres, thanks for sharing. I'm curious how test scripts can be added for data science projects. Could you give some examples?
Great question! I usually add test scripts for the parts of the pipeline that are most likely to break silently, like data validation, feature engineering, and model output checks.
For example:
- Unit tests for data cleaning and feature functions (e.g. does create_days_since_signup() return the correct values?)
- And if I deploy a model, I add sanity checks on the outputs (e.g., flagging if too many predictions are the same)
Hope that helps!
Thank you for the insights. I haven't worked with test scripts for data science projects. Do you have any suggestions or any references that I can refer to if I want to work on this? I would love to know more about this!
Amazing 👏
Thanks for sharing library reference Andres .As I recently took up a new data role that involves building analytical solutions for deployment, I learned the importance and benefits of building structured project folders. This is definitely going to a helpful reference point