INDEX
Explanations
sentences that express fear or hesitation regarding taking action
New Auto-Interp
Negative Logits
輪
-0.15
è®
-0.15
azio
-0.15
ials
-0.14
.fi
-0.14
ÑĥÑĩа
-0.14
neau
-0.13
ached
-0.13
имÑĥ
-0.13
immune
-0.13
POSITIVE LOGITS
McMahon
0.16
ifter
0.16
tro
0.16
Tro
0.15
اذ
0.14
iris
0.14
brid
0.14
تÙĬÙĨ
0.14
ÏĢη
0.14
_slope
0.14
Activations Density 0.013%