INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
inel
-0.15
one
-0.15
owl
-0.15
eer
-0.15
oard
-0.14
forms
-0.14
:pk
-0.14
vis
-0.14
avit
-0.14
ine
-0.14
POSITIVE LOGITS
wick
0.18
lashes
0.16
urch
0.15
ITES
0.15
ville
0.15
VILLE
0.14
erin
0.14
thur
0.14
Lance
0.14
steel
0.14
Activations Density 0.063%