INDEX
Explanations
numerical values related to performance metrics
New Auto-Interp
Negative Logits
Kris
-0.17
_↵
-0.16
áy
-0.15
exc
-0.15
ary
-0.14
azar
-0.14
ugi
-0.14
regation
-0.14
ú
-0.14
Thr
-0.14
POSITIVE LOGITS
istrovstvÃŃ
0.17
gons
0.15
presso
0.15
ambi
0.14
sounding
0.14
åħ¸
0.14
agina
0.13
phins
0.13
amat
0.13
/misc
0.13
Activations Density 0.100%