INDEX
Explanations
questions that express disbelief or surprise
New Auto-Interp
Negative Logits
ounder
-0.16
uries
-0.15
ÄĽÅ¾
-0.15
jem
-0.15
Ã¥n
-0.15
olec
-0.15
awn
-0.14
алÑİ
-0.14
oor
-0.14
è²Į
-0.14
POSITIVE LOGITS
did
0.19
.did
0.18
planet
0.18
do
0.17
Next
0.16
)did
0.16
planet
0.16
kind
0.16
about
0.15
happened
0.15
Activations Density 0.058%