INDEX
Explanations
instances where a statement contradicts something in the text
the word "not" in various contexts
New Auto-Interp
Negative Logits
tein
-0.81
rift
-0.78
ixel
-0.78
arten
-0.70
istle
-0.68
TIME
-0.67
Tycoon
-0.66
Times
-0.66
ngth
-0.63
velt
-0.62
POSITIVE LOGITS
necessarily
1.29
icable
1.05
etheless
1.04
epad
0.96
eworthy
0.94
icably
0.90
withstanding
0.89
enough
0.84
cially
0.78
bothering
0.75
Activations Density 0.045%