INDEX
Explanations
punctuation marks "?" in sentences
questions posed in the text
New Auto-Interp
Negative Logits
rod
-0.82
shaw
-0.78
bage
-0.69
esville
-0.66
uted
-0.64
Gall
-0.64
cephal
-0.64
handshake
-0.63
sleeper
-0.62
inki
-0.61
POSITIVE LOGITS
?
0.93
.?
0.93
???
0.81
?:
0.78
Ħ¢
0.75
Nope
0.74
Laure
0.73
Seems
0.73
?,
0.73
¶
0.73
Activations Density 0.005%