INDEX
Explanations
negative statements or contradictions
negation phrases or words indicating denial
New Auto-Interp
Negative Logits
anmar
-0.74
iery
-0.67
senal
-0.66
ngth
-0.65
cribed
-0.65
ugu
-0.62
obo
-0.60
unks
-0.60
owan
-0.60
arine
-0.59
POSITIVE LOGITS
etheless
0.82
enough
0.81
necessarily
0.77
conclusive
0.71
nonetheless
0.70
unanimous
0.69
exact
0.66
deter
0.65
clin
0.64
sufficient
0.63
Activations Density 0.304%