INDEX
Explanations
proper nouns, specifically names of individuals and entities
New Auto-Interp
Negative Logits
p
-0.17
shooting
-0.15
sort
-0.15
ac
-0.15
ret
-0.14
dist
-0.14
feed
-0.14
dess
-0.14
sorts
-0.14
tun
-0.14
POSITIVE LOGITS
clair
0.16
mại
0.15
má
0.15
лÑıв
0.15
avras
0.14
ĥĿ
0.14
ê³
0.14
á»ijng
0.14
ëıĻ
0.14
iete
0.14
Activations Density 0.044%