INDEX
Explanations
terms related to measurement or magnitude across various contexts
New Auto-Interp
Negative Logits
zelf
-0.19
ernals
-0.17
iates
-0.16
est
-0.16
sell
-0.16
reads
-0.15
urer
-0.14
nings
-0.14
role
-0.14
urers
-0.14
POSITIVE LOGITS
-up
0.24
-down
0.23
able
0.22
ToFit
0.20
-out
0.20
out
0.20
tron
0.19
way
0.17
ardy
0.17
azy
0.17
Activations Density 0.016%