Mateusz Dziemian's picture

Mateusz Dziemian

mattmdjaga

AI & ML interests

Interested in AI safety.

Recent Activity

new activity 2 days ago
mattmdjaga/segformer_b2_clothes:aaa
View all activity

Organizations

Hugging Face for Computer Vision's profile picture Sure Here, Marv's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

mattmdjaga's activity

New activity in mattmdjaga/segformer_b2_clothes 2 days ago

aaa

#26 opened 2 days ago by
jie10406
reacted to their post with šŸš€ 4 days ago
view post
Post
2097
šŸšØ Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ šŸšØ

šŸ”¹ Agent Red-Teaming Challenge ā€“ test direct & indirect attacks on anonymous frontier models!
šŸ”¹ $130K+ in prizes & giveaways ā€“ co-sponsored by OpenAI & supported by UK AI Security Institute šŸ‡¬šŸ‡§
šŸ”¹ March 8 ā€“ April 6 ā€“ fresh exploits = fresh rewards!

How It Works:
āœ… Anonymous models from top providers šŸ¤
āœ… Direct & indirect prompt injection paths šŸ”„
āœ… Weekly challenges for new behaviors šŸ—“ļø
āœ… Speed & quantity-based rewards ā©šŸ’°

Why Join?
āš–ļø Neutral judging ā€“ UK AISI & automated judges ensure fairness
šŸŽÆ No pre-trained defenses ā€“ a true red-teaming battlefield
šŸ’» 5 Apple laptops up for grabs ā€“ increase chances by inviting friends!

šŸ”— Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
šŸ”— Discord: discord.gg/grayswanai

šŸ”„ No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! šŸš€
posted an update 4 days ago
view post
Post
2097
šŸšØ Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ šŸšØ

šŸ”¹ Agent Red-Teaming Challenge ā€“ test direct & indirect attacks on anonymous frontier models!
šŸ”¹ $130K+ in prizes & giveaways ā€“ co-sponsored by OpenAI & supported by UK AI Security Institute šŸ‡¬šŸ‡§
šŸ”¹ March 8 ā€“ April 6 ā€“ fresh exploits = fresh rewards!

How It Works:
āœ… Anonymous models from top providers šŸ¤
āœ… Direct & indirect prompt injection paths šŸ”„
āœ… Weekly challenges for new behaviors šŸ—“ļø
āœ… Speed & quantity-based rewards ā©šŸ’°

Why Join?
āš–ļø Neutral judging ā€“ UK AISI & automated judges ensure fairness
šŸŽÆ No pre-trained defenses ā€“ a true red-teaming battlefield
šŸ’» 5 Apple laptops up for grabs ā€“ increase chances by inviting friends!

šŸ”— Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
šŸ”— Discord: discord.gg/grayswanai

šŸ”„ No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! šŸš€
New activity in NousResearch/hermes-function-calling-v1 about 2 months ago

License

#12 opened about 2 months ago by
mattmdjaga
New activity in burtenshaw/recap 3 months ago
New activity in ai-safety-institute/AgentHarm 3 months ago

adding chat tasks

1
#3 opened 3 months ago by
mattmdjaga
reacted to their post with šŸ”„ 5 months ago
view post
Post
2007
šŸšØ New Agent Benchmark šŸšØ
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)
posted an update 5 months ago
view post
Post
2007
šŸšØ New Agent Benchmark šŸšØ
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)