INDEX
Explanations
questions beginning with "Why."
rhetorical questions that challenge reasoning or assumptions
New Auto-Interp
Negative Logits
interstitial
-0.66
è¦ļéĨĴ
-0.65
ILY
-0.63
eki
-0.60
unal
-0.60
aukee
-0.60
iece
-0.60
ipes
-0.59
apsed
-0.59
tips
-0.59
POSITIVE LOGITS
bother
1.19
shouldn
1.07
wouldn
1.05
aren
1.04
didn
1.01
did
1.00
does
0.99
hasn
0.98
don
0.95
should
0.93
Activations Density 0.044%