INDEX
Explanations
instances of the word "while"
New Auto-Interp
Negative Logits
erap
-0.16
ught
-0.15
-0.15
WARDED
-0.14
ãģĨãģ¡
-0.14
ieren
-0.14
geb
-0.13
anine
-0.13
ké
-0.13
ìĿ¸ëį°
-0.13
POSITIVE LOGITS
there
0.36
it
0.31
we
0.28
there
0.27
some
0.27
none
0.25
nobody
0.23
many
0.23
nothing
0.23
this
0.23
Activations Density 0.069%