INDEX
Explanations
questions and references to "why" in contexts of confusion or exploration
New Auto-Interp
Negative Logits
:numel
-0.15
ule
-0.15
eus
-0.14
.Void
-0.14
ÑĩиÑģ
-0.14
ãģ¦ãĤĤ
-0.13
-fontawesome
-0.13
.Bounds
-0.13
agnar
-0.13
etine
-0.13
POSITIVE LOGITS
/how
0.32
Pant
0.18
soever
0.18
they
0.16
ulia
0.15
we
0.15
Mayo
0.14
itzer
0.14
there
0.14
it
0.14
Activations Density 0.023%