INDEX
Explanations
the concept of "work" and various forms of its usage
New Auto-Interp
Negative Logits
alsy
-0.17
cki
-0.16
æĺŃ
-0.15
oria
-0.14
èĢĥ
-0.14
apl
-0.14
ELLOW
-0.14
ateria
-0.14
ORIA
-0.14
indent
-0.14
POSITIVE LOGITS
backward
0.21
out
0.21
toward
0.20
towards
0.20
harder
0.19
backwards
0.19
magic
0.17
shopping
0.17
ozem
0.16
through
0.16
Activations Density 0.042%