INDEX
Explanations
questions in the text
rhetorical questions
New Auto-Interp
Negative Logits
specific
-0.74
critical
-0.69
myster
-0.67
clin
-0.67
disciplines
-0.67
mascul
-0.67
mination
-0.66
eni
-0.65
satell
-0.65
precise
-0.64
POSITIVE LOGITS
Nope
1.51
Well
1.47
Yep
1.35
Try
1.30
Yeah
1.29
Probably
1.27
Consider
1.25
Then
1.22
Congratulations
1.22
Maybe
1.21
Activations Density 0.096%