INDEX
Explanations
questions or statements ending in a question mark
conversational phrases and rhetorical questions
New Auto-Interp
Negative Logits
places
-0.74
chairs
-0.69
gra
-0.68
cedes
-0.65
forts
-0.65
lly
-0.61
chat
-0.61
stag
-0.60
ties
-0.59
inside
-0.59
POSITIVE LOGITS
Thou
0.76
thou
0.70
anybody
0.68
ya
0.67
ILCS
0.66
there
0.66
YOU
0.63
Ya
0.62
THEY
0.62
guiActiveUnfocused
0.62
Activations Density 0.112%