INDEX
Explanations
mentions of denial or being in denial
instances of the word "denial"
New Auto-Interp
Negative Logits
IU
-0.93
liam
-0.85
orsi
-0.79
ulpt
-0.73
imen
-0.71
enegger
-0.71
âĢ¢âĢ¢
-0.69
encers
-0.69
rim
-0.68
tiny
-0.67
POSITIVE LOGITS
denial
1.56
denying
1.04
deny
0.98
naissance
0.91
denies
0.88
acknowledgement
0.84
mitigation
0.83
etheless
0.83
den
0.78
denied
0.78
Activations Density 0.005%