Researchers Uncovers New Methods To Defend AI Models Against Universal Jailbreaks
Researchers from the Anthropic Safeguards Research Team have developed a new approach to protect AI models from universal jailbreaks. This innovative method, known as Constitutional Classifiers, has shown remarkable resilience against thousands of hours of human red teaming and synthetic evaluations. Universal jailbreaks refer to inputs designed to bypass the safety guardrails of AI models, […]
The post Researchers Uncovers New Methods To Defend AI Models Against Universal Jailbreaks appeared first on Cyber Security News.