INDEX
Explanations
references to personal relationships and emotional connections
New Auto-Interp
Negative Logits
ź
-0.16
idia
-0.16
ó
-0.14
uku
-0.14
zcze
-0.14
iamo
-0.14
DT
-0.14
uforia
-0.14
regional
-0.13
abyrinth
-0.13
POSITIVE LOGITS
cold
0.24
expression
0.24
ind
0.20
Expression
0.19
expression
0.19
cold
0.18
Expression
0.17
pale
0.17
Cold
0.17
speech
0.17
Activations Density 0.024%