INDEX
Explanations
names of specific individuals
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
Downloadha
-0.97
staking
-0.88
terday
-0.87
arial
-0.75
ually
-0.74
afort
-0.73
ariat
-0.72
uous
-0.72
ature
-0.70
childbirth
-0.68
POSITIVE LOGITS
Gow
0.83
nda
0.83
alus
0.79
tin
0.79
lists
0.79
mus
0.79
glers
0.79
swick
0.77
dy
0.77
lins
0.76
Activations Density 0.023%