INDEX
Explanations
sentences that end with a period and potentially have quotations and descriptions of people's reactions
sentences that express skepticism or disbelief
New Auto-Interp
Negative Logits
levers
-0.65
mine
-0.64
HM
-0.64
sleeper
-0.63
optional
-0.63
unsus
-0.62
»Ĵ
-0.62
united
-0.61
hesda
-0.61
leigh
-0.60
POSITIVE LOGITS
"[
1.21
Asked
1.14
Instead
1.08
Speaking
1.05
He
1.02
Whereas
1.01
Specifically
0.99
"#
0.98
Saying
0.98
Whenever
0.97
Activations Density 0.333%