INDEX
Explanations
expressions of surprise or emphasis
New Auto-Interp
Negative Logits
eniable
-0.17
ilon
-0.15
Apprec
-0.15
ÄĽÅ¾
-0.15
idious
-0.15
enty
-0.15
ounder
-0.14
oor
-0.14
ãn
-0.14
jem
-0.14
POSITIVE LOGITS
else
0.20
more
0.20
a
0.20
do
0.19
could
0.18
better
0.18
timing
0.17
an
0.17
sis
0.17
did
0.16
Activations Density 0.043%