INDEX
Explanations
names of people, particularly those related to sports or media
New Auto-Interp
Negative Logits
ůr
-0.18
bole
-0.15
sûr
-0.15
(crate
-0.15
aton
-0.15
bul
-0.15
ÑĥлÑİ
-0.15
ordova
-0.15
abler
-0.14
pras
-0.14
POSITIVE LOGITS
å·¥
0.18
Wonder
0.16
wonder
0.15
daÅŁ
0.14
Offsets
0.14
actic
0.14
945
0.14
xx
0.14
qi
0.14
ight
0.13
Activations Density 0.120%