INDEX
Explanations
instances of the word "first" and indicators of prominence or ranking
New Auto-Interp
Negative Logits
igg
-0.17
owi
-0.16
coli
-0.14
zev
-0.14
Helm
-0.14
apper
-0.14
rq
-0.14
rah
-0.14
annon
-0.14
729
-0.14
POSITIVE LOGITS
cname
0.14
indle
0.14
kuk
0.14
Signing
0.14
mai
0.14
composite
0.14
Hud
0.14
uda
0.14
èı
0.13
лÑİ
0.13
Activations Density 0.084%