INDEX
Explanations
proper nouns, particularly names associated with historical figures and notable individuals
New Auto-Interp
Negative Logits
ries
-0.17
229
-0.15
umed
-0.15
827
-0.15
sis
-0.15
olia
-0.15
ordes
-0.14
orne
-0.14
orum
-0.14
ies
-0.13
POSITIVE LOGITS
uai
0.15
fitte
0.15
æķ·
0.14
zad
0.14
/type
0.14
Tos
0.14
Ģìŀ¥
0.13
hots
0.13
inho
0.13
ùy
0.13
Activations Density 0.113%