INDEX
Explanations
questions ending in '?'
sentences that contain questions or exclamations
New Auto-Interp
Negative Logits
amas
-0.64
duct
-0.62
heastern
-0.61
isol
-0.61
ciplinary
-0.61
hran
-0.60
anos
-0.59
oreal
-0.58
reven
-0.58
onis
-0.57
POSITIVE LOGITS
Then
1.20
Well
1.08
Nope
1.06
Immediately
1.04
Later
1.01
And
1.00
Because
0.99
Which
0.98
Yeah
0.97
Suddenly
0.95
Activations Density 0.143%