INDEX
Explanations
phrases expressing negation or contradiction
interrogative phrases indicating denial or negation
New Auto-Interp
Negative Logits
aly
-0.72
Vers
-0.67
iland
-0.67
cember
-0.65
haul
-0.63
pez
-0.62
tnc
-0.61
umen
-0.61
olen
-0.61
istries
-0.60
POSITIVE LOGITS
nor
0.92
anymore
0.87
necessarily
0.73
anybody
0.72
anyone
0.71
any
0.67
slightest
0.67
ndra
0.65
hemy
0.64
}\
0.63
Activations Density 0.037%