4 Comments

Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.

However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements

Expand full comment

Reddit had locked down their API a while ago, is it free now?

Expand full comment

As far as I know it has a free tier that is available for non-commercial uses, such as personal projects and academic research.

But to be honest, I haven’t tried it since the changes so I’m too familiar with its limitations.

I’m going to do some research and update my article if necessary, thank you!

Expand full comment

Cool! I used to love doing projects with Pushift which i believe was affiliated with reddit and then at some point they started charging a ton (i believe because they realized all the LLM companies were getting hugely valuable data for free)

Expand full comment