Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.
However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements
Cool! I used to love doing projects with Pushift which i believe was affiliated with reddit and then at some point they started charging a ton (i believe because they realized all the LLM companies were getting hugely valuable data for free)
Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.
However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements
Reddit had locked down their API a while ago, is it free now?
As far as I know it has a free tier that is available for non-commercial uses, such as personal projects and academic research.
But to be honest, I haven’t tried it since the changes so I’m too familiar with its limitations.
I’m going to do some research and update my article if necessary, thank you!
Cool! I used to love doing projects with Pushift which i believe was affiliated with reddit and then at some point they started charging a ton (i believe because they realized all the LLM companies were getting hugely valuable data for free)