INDEX
Explanations
negations and expressions of doubt or denial
New Auto-Interp
Negative Logits
not
-0.19
asn
-0.15
não
-0.15
ikke
-0.15
somewhat
-0.15
no
-0.15
nicht
-0.14
772
-0.14
bruar
-0.14
never
-0.14
POSITIVE LOGITS
oriously
0.25
ori
0.25
anymore
0.24
necessarily
0.23
ched
0.23
epad
0.22
ches
0.21
yet
0.19
tingham
0.19
ional
0.18
Activations Density 0.289%