INDEX
Explanations
questions starting with "can" or its variations
New Auto-Interp
Negative Logits
ather
-0.18
gel
-0.16
anke
-0.16
ATHER
-0.16
mits
-0.15
athers
-0.15
lify
-0.15
_SECURE
-0.15
chen
-0.14
ois
-0.14
POSITIVE LOGITS
you
0.24
't
0.24
’t
0.24
we
0.23
someone
0.21
uto
0.18
va
0.18
onic
0.17
apes
0.17
opy
0.17
Activations Density 0.028%