INDEX
Explanations
elements related to social interactions and emotional responses
New Auto-Interp
Negative Logits
omik
-0.15
etine
-0.14
racat
-0.14
ваем
-0.14
éłĨ
-0.13
ãĥ¯ãĤ¤ãĥĪ
-0.13
Comparer
-0.13
æ°
-0.13
fills
-0.13
ewidth
-0.13
POSITIVE LOGITS
others
0.63
Others
0.55
others
0.52
Others
0.49
another
0.35
some
0.32
Another
0.30
another
0.28
Some
0.28
ones
0.27
Activations Density 0.174%