INDEX
Explanations
negative contractions related to denial or refusal
New Auto-Interp
Negative Logits
LES
-0.16
uard
-0.16
DMI
-0.16
gis
-0.15
ÂĿ
-0.14
les
-0.14
524
-0.14
sec
-0.14
ena
-0.13
eval
-0.13
POSITIVE LOGITS
not
0.52
not
0.42
Not
0.36
NOT
0.33
Not
0.32
't
0.30
’t
0.30
not
0.30
.not
0.28
NOT
0.27
Activations Density 0.344%