Emet Research's picture

Emet Research

EmetTheGolum
·

AI & ML interests

Data and Data Acquisition - A partner, not a vendor.

Recent Activity

View all activity

Organizations

Emet 's profile picture

EmetTheGolum's activity

reacted to fdaudens's post with ❤️ 4 days ago
view post
Post
1779
Reminder: Don’t. Use. ChatGPT. As. A. Calculator. Seriously. 🤖

Loved listening to @sasha on Hard Fork—it really made me think.

A few takeaways that hit home:
- Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies.
- Evaluate if generative AI is the right tool for certain tasks (like search) before using it.

Curious about the full conversation? https://www.nytimes.com/2025/01/17/podcasts/hardfork-tiktok-rednote-environment.html. Give it a listen—it’s worth it! 🌍
  • 1 reply
·
reacted to ezgikorkmaz's post with 🚀 4 days ago
reacted to cfahlgren1's post with ❤️ about 2 months ago
view post
Post
3153
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
·