INDEX
Explanations
negations and contractions related to uncertainty or denial
New Auto-Interp
Negative Logits
not
-0.30
NOT
-0.19
nicht
-0.18
Ïģθ
-0.18
no
-0.17
never
-0.17
не
-0.17
không
-0.16
hen
-0.16
Not
-0.16
POSITIVE LOGITS
necessarily
0.37
anymore
0.26
yet
0.24
ched
0.22
ecessarily
0.21
ori
0.21
ches
0.20
quite
0.19
yet
0.19
epad
0.19
Activations Density 0.196%