INDEX
Explanations
questions starting with "How" or "What" indicating inquiry or seeking information
New Auto-Interp
Negative Logits
/how
-0.16
how
-0.15
jev
-0.14
.amazonaws
-0.14
ä½ķ
-0.14
s
-0.14
stru
-0.14
why
-0.14
x
-0.13
Tent
-0.13
POSITIVE LOGITS
does
0.29
Does
0.29
do
0.26
did
0.24
does
0.24
Does
0.23
Do
0.22
Are
0.20
should
0.19
Should
0.19
Activations Density 0.042%