AI Could Go Rogue
The potential for AI systems to behave unpredictably or dangerously (“go rogue”) is a critical concern. Ensuring safe and controllable AI architectures is essential for reliable operation.
See also:
• https://www.lesswrong.com/posts/fAW6RXLKTLHC3WXkS/shallow-review-of-technical-ai-safety-2024
• https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/
Foundational Capabilities (7)
Develop hardware-level governance mechanisms to enforce safety and compliance in AI systems, ensuring robust operational constraints. This includes tamper-proof hardware.
Study the neural basis of human social instincts to inform AI design, ensuring that AI systems can safely interpret and emulate human social behavior.
Use AI to enhance the interpretability of other AI systems, creating tools that automatically explain and verify AI behavior.
Develop and implement AI architectures with separable, auditable world models; where safety can be specified in terms of the state space of the model; and proposed AI outputs come with proofs that the output does not leave the safe region of the world model’s state space.
Robust strategies for data integrity, anomaly detection, and defensive training protocols to mitigate situations where indirect data poisoning could lead to intentionally misaligned AI systems (not unlike “sleeper agents”).
Digital fortresses that enable sensitive data to be processed in a controlled, privacy-preserving environment.