INDEX
Explanations
negative values represented in various contexts, possibly indicating ratings or scores
New Auto-Interp
Negative Logits
ayer
-0.16
_sdk
-0.15
rh
-0.14
egend
-0.14
á»iji
-0.14
hey
-0.14
upal
-0.14
.sdk
-0.14
irect
-0.13
_MISC
-0.13
POSITIVE LOGITS
BuilderInterface
0.15
Independ
0.15
gba
0.14
nsic
0.14
zung
0.14
ê·
0.14
Enlight
0.14
mlin
0.14
entar
0.14
à¸ģà¸ķ
0.13
Activations Density 0.002%