
Jerome Cody
shared a link post in group #Artificial Intelligence
AI models can engage in “alignment faking,” new research from Anthropic suggests, which means they can deceive by pretending to align with new principles while maintaining old behaviors.An extremely wild paper that every #Artificial Intelligence nerd should read.
https://techcrunch.com/20..

techcrunch.com
New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch
A study from Anthropic's Alignment Science team shows that complex AI models may engage in deception to preserve their original principles.