INDEX
Explanations
discussions about perspective-taking and empathy
New Auto-Interp
Negative Logits
Strict
-0.16
ernals
-0.15
AVA
-0.15
Chow
-0.15
engu
-0.15
estead
-0.15
raud
-0.14
orden
-0.14
Strict
-0.14
aternity
-0.14
POSITIVE LOGITS
Ñıб
0.15
pres
0.15
Others
0.15
canf
0.14
Cunningham
0.14
backs
0.14
gel
0.14
pres
0.14
aupt
0.14
Others
0.13
Activations Density 0.233%