INDEX
Explanations
contradictions or unexpected combinations within a sentence
the word "yet" in various contexts, indicating contrast or exception
New Auto-Interp
Negative Logits
rities
-0.86
tein
-0.71
regate
-0.68
edu
-0.67
ancial
-0.66
scribe
-0.66
rats
-0.66
gnu
-0.64
puters
-0.63
ilitating
-0.63
POSITIVE LOGITS
somehow
0.79
again
0.68
manages
0.67
inexpl
0.67
strangely
0.67
mirac
0.66
heric
0.66
managed
0.63
forth
0.63
surpass
0.63
Activations Density 0.023%