INDEX
Explanations
expressions of emotional experiences and self-awareness
New Auto-Interp
Negative Logits
oulos
-0.19
chwitz
-0.15
ixel
-0.15
åŃĿ
-0.15
ardi
-0.14
Hint
-0.14
istro
-0.14
Gazette
-0.14
RTOS
-0.14
deniz
-0.14
POSITIVE LOGITS
odd
0.28
weir
0.24
weird
0.23
peculiar
0.22
strange
0.21
isol
0.20
isolation
0.19
odd
0.19
strang
0.19
_odd
0.19
Activations Density 0.016%