INDEX
Explanations
negations or contradictions in sentences
negations or phrases that express what should not be done or considered
New Auto-Interp
Negative Logits
tein
-0.78
velt
-0.70
WER
-0.67
LIN
-0.67
hower
-0.66
ulty
-0.65
LG
-0.65
rift
-0.64
weekly
-0.64
LU
-0.62
POSITIVE LOGITS
necessarily
1.36
icably
1.14
epad
1.10
icable
1.00
etheless
0.99
bothering
0.91
adequately
0.86
bothered
0.85
eworthy
0.84
ifies
0.82
Activations Density 0.048%