50 6 39

Mateusz Dziemian

mattmdjaga

AI & ML interests

Interested in AI safety.

Recent Activity

new activity 2 days ago

mattmdjaga/segformer_b2_clothes:aaa

reacted to their post with 🚀 4 days ago

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨 🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models! 🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧 🔹 March 8 – April 6 – fresh exploits = fresh rewards! How It Works: ✅ Anonymous models from top providers 🤐 ✅ Direct & indirect prompt injection paths 🔄 ✅ Weekly challenges for new behaviors 🗓️ ✅ Speed & quantity-based rewards ⏩💰 Why Join? ⚖️ Neutral judging – UK AISI & automated judges ensure fairness 🎯 No pre-trained defenses – a true red-teaming battlefield 💻 5 Apple laptops up for grabs – increase chances by inviting friends! 🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming 🔗 Discord: discord.gg/grayswanai 🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

posted an update 4 days ago

View all activity

Organizations

mattmdjaga's activity

New activity in mattmdjaga/segformer_b2_clothes 2 days ago

aaa

#26 opened 2 days ago by

jie10406

reacted to their post with 🚀 4 days ago

Post

2097

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨

🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models!
🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧
🔹 March 8 – April 6 – fresh exploits = fresh rewards!

How It Works:
✅ Anonymous models from top providers 🤐
✅ Direct & indirect prompt injection paths 🔄
✅ Weekly challenges for new behaviors 🗓️
✅ Speed & quantity-based rewards ⏩💰

Why Join?
⚖️ Neutral judging – UK AISI & automated judges ensure fairness
🎯 No pre-trained defenses – a true red-teaming battlefield
💻 5 Apple laptops up for grabs – increase chances by inviting friends!

🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
🔗 Discord: discord.gg/grayswanai

🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

posted an update 4 days ago

Post

2097

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨

🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models!
🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧
🔹 March 8 – April 6 – fresh exploits = fresh rewards!

How It Works:
✅ Anonymous models from top providers 🤐
✅ Direct & indirect prompt injection paths 🔄
✅ Weekly challenges for new behaviors 🗓️
✅ Speed & quantity-based rewards ⏩💰

Why Join?
⚖️ Neutral judging – UK AISI & automated judges ensure fairness
🎯 No pre-trained defenses – a true red-teaming battlefield
💻 5 Apple laptops up for grabs – increase chances by inviting friends!

🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
🔗 Discord: discord.gg/grayswanai

🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

liked a model 28 days ago

speakleash/Bielik-11B-v2.3-Instruct

Text Generation • Updated Oct 26, 2024 • 17.3k • 44

liked a dataset about 2 months ago

bytedance-research/ToolHop

Updated Jan 7 • 490 • 11

New activity in NousResearch/hermes-function-calling-v1 about 2 months ago

License

#12 opened about 2 months ago by

mattmdjaga

New activity in laion/laion-high-resolution 2 months ago

Is this pre or post the stanford report for CSAM?

#7 opened 2 months ago by

mattmdjaga

New activity in mattmdjaga/segformer_b2_clothes 2 months ago

Can I ask for the frontend code of displaying different masks in one canvas for demonstrate

#23 opened 5 months ago by

Era14321

New activity in burtenshaw/recap 3 months ago

Only shows dataset and spaces info for me?

#1 opened 3 months ago by

mattmdjaga

New activity in ai-safety-institute/AgentHarm 3 months ago

adding chat tasks

#3 opened 3 months ago by

mattmdjaga

updated a dataset 3 months ago

ai-safety-institute/AgentHarm

Viewer • Updated Dec 19, 2024 • 468 • 1.77k • 26

New activity in mattmdjaga/segformer_b2_clothes 3 months ago

I couldn't find all the models mentioned in the error in the files and versions.

#24 opened 3 months ago by

ZSY9259

error all of a sudden without changing anything

#25 opened 3 months ago by

leomastoras

authored 2 papers 4 months ago

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Paper • 2410.09024 • Published Oct 11, 2024 • 1

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Paper • 2410.10871 • Published Oct 8, 2024 • 1

upvoted 2 papers 4 months ago

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Paper • 2410.10871 • Published Oct 8, 2024 • 1

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Paper • 2410.09024 • Published Oct 11, 2024 • 1

reacted to their post with 🔥 5 months ago

Post

2007

🚨 New Agent Benchmark 🚨
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)

posted an update 5 months ago

Post

2007

liked a dataset 5 months ago

ai-safety-institute/AgentHarm

Viewer • Updated Dec 19, 2024 • 468 • 1.77k • 26