Google DeepMind has published a new paper arguing that current methods for evaluating AI ethics are fundamentally flawed. The researchers state that simply checking if an AI model produces a seemingly correct answer—its moral performance—does not reveal whether it actually understands the underlying ethical principles.
The team identifies three core challenges in assessing AI morality: the "facsimile problem" where AIs might recycle training data without reasoning, "moral multidimensionality" where real-world choices involve balancing multiple values, and "moral pluralism" where ethical norms vary across cultures. To address this, they propose adversarial testing with novel scenarios unlikely to be in training data, and evaluations that check if an AI can switch between different ethical frameworks coherently.
The researchers acknowledge current AI models are brittle and not yet capable of genuine moral competence. However, they advocate for establishing new scientific standards to measure this, similar to how math skills are tested. This roadmap aims to guide developers toward building AI systems that can be trusted with significant real-world decisions.
