INDEX
Explanations
negations or negative expressions in the text
New Auto-Interp
Negative Logits
ิลปะ
-0.68
<bos>
-0.68
Atem
-0.64
ing
-0.62
teil
-0.60
Wadsworth
-0.58
bạch
-0.58
merito
-0.56
yana
-0.56
y
-0.56
POSITIVE LOGITS
doesn
1.87
Doesn
1.80
doesn
1.79
Doesn
1.78
didn
1.43
DOES
1.34
Does
1.29
Does
1.29
does
1.28
Didn
1.27
Activations Density 0.027%