INDEX
Explanations
negations or contradictions in the text
New Auto-Interp
Negative Logits
spor
-0.72
tein
-0.64
creations
-0.63
crit
-0.62
propelled
-0.61
rotated
-0.61
indirectly
-0.60
que
-0.59
friends
-0.59
facult
-0.59
POSITIVE LOGITS
unanim
0.83
umerable
0.81
eway
0.79
enough
0.75
shortage
0.75
omore
0.75
enough
0.71
yet
0.71
overlap
0.70
anymore
0.69
Activations Density 0.055%