INDEX
Explanations
proper nouns, specifically names of notable individuals
New Auto-Interp
Negative Logits
nonzero
-0.19
alem
-0.14
mentioning
-0.14
Afr
-0.14
â̦
-0.14
and
-0.14
mentioned
-0.14
aje
-0.13
incl
-0.13
[
-0.13
POSITIVE LOGITS
itbart
0.18
izontal
0.17
adipiscing
0.17
ikal
0.16
κÏĮ
0.16
stalk
0.15
ÏİÏģα
0.15
ardin
0.15
avo
0.15
bserv
0.14
Activations Density 0.000%