INDEX
Explanations
sentences with strong emotional responses
sentences that express opinions or reactions
New Auto-Interp
Negative Logits
thal
-0.86
hay
-0.75
glim
-0.73
prey
-0.72
transition
-0.72
hemer
-0.71
transitions
-0.71
pit
-0.71
disemb
-0.71
waterfall
-0.70
POSITIVE LOGITS
Though
1.15
Instead
1.10
His
1.08
Their
1.06
Ironically
1.06
Perhaps
1.05
Already
1.05
Particularly
1.04
Critics
1.03
They
1.01
Activations Density 0.810%