INDEX
Explanations
phrases associated with fear and concern related to decision-making and expression
New Auto-Interp
Negative Logits
apos
-0.18
fte
-0.16
IEW
-0.15
kte
-0.15
emain
-0.15
æĪ
-0.14
ook
-0.14
asename
-0.14
λογ
-0.14
agen
-0.14
POSITIVE LOGITS
because
0.16
RuleContext
0.15
gers
0.15
bery
0.15
aju
0.14
Yue
0.14
Civic
0.14
rage
0.14
ิà¸ļ
0.14
AxisSize
0.13
Activations Density 0.330%