INDEX
Explanations
references to collective experiences and shared emotional responses among groups
New Auto-Interp
Negative Logits
alone
-0.15
nbr
-0.15
IBLE
-0.15
pane
-0.14
ylie
-0.14
æ¡IJ
-0.14
pers
-0.13
onde
-0.13
umper
-0.13
ondo
-0.13
POSITIVE LOGITS
alike
0.22
everybody
0.16
except
0.16
Except
0.16
except
0.15
_except
0.15
Except
0.15
Everyone
0.15
udes
0.15
rat
0.15
Activations Density 0.147%