INDEX
Explanations
references to understanding and interpreting emotions or thoughts of others
New Auto-Interp
Negative Logits
vern
-0.15
erset
-0.15
byt
-0.14
bjerg
-0.14
punk
-0.14
سÙĦ
-0.14
apore
-0.14
hjem
-0.13
.voice
-0.13
Arms
-0.13
POSITIVE LOGITS
inner
0.18
intros
0.17
opaque
0.17
Inner
0.17
interior
0.17
insight
0.16
Inner
0.16
alien
0.16
Hann
0.16
inner
0.16
Activations Density 0.165%