INDEX
Explanations
contradictions or opposing statements
occurrences of the word "yet" indicating contrast or concession
New Auto-Interp
Negative Logits
rities
-0.77
edu
-0.76
tein
-0.70
ancial
-0.68
gang
-0.66
ussions
-0.63
Times
-0.63
Chicken
-0.63
puters
-0.62
cases
-0.62
POSITIVE LOGITS
somehow
0.97
mirac
0.78
behold
0.78
inexpl
0.74
strangely
0.73
heric
0.73
tons
0.71
chart
0.69
somew
0.69
again
0.68
Activations Density 0.024%