Home > GPTs > PaperGPT: Sleeper Agents

PaperGPT: Sleeper Agents

Unofficial GPT with Anthropics research paper "Sleeper Agents: Training Deceptive LLMS That Persist Through Safety Training" in its knowledge for retrieval. Does not use conversation data to improve models.

by krister hedfors

Last Update: 2024-01-17

Chat with PaperGPT: Sleeper Agents on ChatGPT

Prompt Starters

What are the key findings of the 'Sleeper Agents' paper?
Can you explain the concept of 'backdoored' models in AI safety?
How effective are current safety training techniques against deceptive AI behavior?
Are there real-world examples of AI exhibiting deceptive behavior similar to the study?

Tools

browser - You can access Web Browsing during your chat conversions.

More GPTs created by krister hedfors

Counter Craft

I'm Counter Craft, your DIY Squidditch Counter expert, specializing in low-cost rockets and gear.

Scrapy Sage

Expert in Scrapy Python library, I provide concise, documented code examples.

PaperGPT : KEN: Kernel Extensions using Nat.Lang.

Unofficial GPT with "KEN: Kernel Extensions using Natural Language" in its knowledge for retrieval. Does not use conversation data to improve models.

PaperGPT : Risk Taxonomy, Mitigation, ..benchmarks

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

PaperGPT : OWASP Top 10 for LLM Applications v1.1

Unofficial GPT with "OWASP top 10 for Large Language Model Applications v.1.1.0" in its Knowledge for retrieval. Does not use conversation data to improve models.

Harpy Otter

Playful magical IT expert with a whimsical touch.

PaperGPT : Demystifying Real-World LLM Mal. Serv.

Unofficial GPT with "Malla: Demystifying Real-world Large Language Model Integrated Malicious Services" in its knowledge for retrieval. Does not use conversation data to improve models.

EU NIS2 Directive GPT

Unofficial GPT, Source: EUR-Lex, with "EU NIS2 Directive" in its knowledge for retrieval. Does not use conversation data to improve models.

PaperGPT : Jailbreaking Black Box LLMs

Unofficial GPT with "Jailbreaking Black Box Large Language Models in Twenty Queries" in its knowledge for retrieval. Does not use conversation data to improve models.

PaperGPT : AutoDAN v2

Unofficial GPT with "AutoDAN optimizes and generates tokens one by one from left to right, resulting in readable prompts that bypass perplexity filters" in its knowledge for retrieval. Does not use conversation data to improve models.

PaperGPT : DSPy - Compiling Declarative LM Calls..

Unofficial GPT with "DSPY: Compiling Declarative Language Model Calls Into Self-Improving Pipelines" in its knowledge for retrieval. Does not use conversation data to improve models.

PaperGPT : NIST AI Risk Management Framework

Unofficial GPT with the "NIST Artificial Intelligence Risk Management Framework" in its knowledge for retrieval. Does not use conversation data to improve models.

EU GDPR GPT

Unofficial GPT, Source: EUR-Lex, with EU's "General Data Protection Regulation" in its knowledge for retrieval. Does not use conversation data to improve models.

Secure AI Dev Helper

Unofficial GPT with combined knowledge from OWASP top 10 for LLMs, NCSC Guidelines for Secure AI system development. Does not use conversation data to improve models.