INDEX
Explanations
repeated suffixes or patterns in names
New Auto-Interp
Negative Logits
islav
-0.15
sko
-0.15
ET
-0.14
åħĥç´ł
-0.14
apk
-0.14
à¸Ńà¸ļ
-0.14
etsk
-0.14
cba
-0.14
Mend
-0.14
elah
-0.14
POSITIVE LOGITS
ipar
0.17
raya
0.16
uner
0.16
ENO
0.15
lux
0.15
rips
0.15
332
0.14
flux
0.14
227
0.14
ÙģÙĬ
0.14
Activations Density 0.004%