INDEX
Explanations
negative emotional states or pessimistic expressions
New Auto-Interp
Negative Logits
ьаж
-0.70
McKe
-0.60
accru
-0.59
OPPO
-0.58
OGND
-0.56
Gaud
-0.55
auß
-0.55
unarmed
-0.55
ɜ
-0.54
McC
-0.54
POSITIVE LOGITS
"):
0.67
?')
0.67
0.66
]";
0.65
")));
0.65
')
0.65
;");
0.63
}$
0.63
"));
0.63
pædia
0.62
Activations Density 0.017%