INDEX
Explanations
expressions of confidence and readiness
New Auto-Interp
Negative Logits
ikat
-0.17
太éĥİ
-0.16
ngle
-0.16
ean
-0.15
tie
-0.15
Progress
-0.15
patched
-0.15
izada
-0.15
agon
-0.15
ugen
-0.15
POSITIVE LOGITS
ertz
0.17
ä¾
0.15
Vend
0.15
elik
0.14
ëĬ
0.14
åĥ
0.14
omba
0.14
ÑĽ
0.13
िथ
0.13
chu
0.13
Activations Density 0.093%