4 Comments
User's avatar
Varun Sagar Theegala's avatar

Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.

However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements

Expand full comment
Joe Hovde's avatar

Reddit had locked down their API a while ago, is it free now?

Expand full comment
Andres Vourakis's avatar

As far as I know it has a free tier that is available for non-commercial uses, such as personal projects and academic research.

But to be honest, I haven’t tried it since the changes so I’m too familiar with its limitations.

I’m going to do some research and update my article if necessary, thank you!

Expand full comment
Joe Hovde's avatar

Cool! I used to love doing projects with Pushift which i believe was affiliated with reddit and then at some point they started charging a ton (i believe because they realized all the LLM companies were getting hugely valuable data for free)

Expand full comment