Mitigating (Indirect) Data Poisoning
Robust strategies for data integrity, anomaly detection, and defensive training protocols to mitigate situations where indirect data poisoning could lead to intentionally misaligned AI systems (not unlike “sleeper agents”).
Resources (1)
AI Risk and Threat Taxonomy
Whitepapers and Essays
R&D Gaps (1)
The potential for AI systems to behave unpredictably or dangerously (“go rogue”) is a critical concern. Ensuring safe and controllable AI architectures is essential for reliable operation.
See also:
• https://www.lesswrong.com/posts/fAW6RXLKTLHC3WXkS/shallow-review-of-technical-ai-safety-2024
• https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/