INDEX
Explanations
proper nouns, specifically names of individuals and entities
New Auto-Interp
Negative Logits
anken
-0.16
ẽ
-0.15
kul
-0.15
usalem
-0.15
ATORY
-0.14
ledi
-0.14
oise
-0.14
οκ
-0.13
abbit
-0.13
Giles
-0.13
POSITIVE LOGITS
Jr
0.20
III
0.16
rast
0.15
III
0.14
ras
0.13
Chief
0.13
nist
0.13
šit
0.13
å¼
0.13
aid
0.13
Activations Density 0.115%