INDEX
Explanations
questions beginning with "How."
New Auto-Interp
Negative Logits
most
-0.17
ça
-0.17
nowhere
-0.15
šk
-0.15
idis
-0.14
ório
-0.14
Wel
-0.14
owell
-0.14
cheon
-0.14
it
-0.14
POSITIVE LOGITS
itz
0.18
ells
0.16
IGHL
0.15
dy
0.15
does
0.15
æĺ¯æĪij
0.15
do
0.14
ever
0.14
ORIZONTAL
0.14
ERSHEY
0.14
Activations Density 0.043%