INDEX
Explanations
mentions of specific names or proper nouns related to individuals
New Auto-Interp
Negative Logits
er
-0.27
o
-0.26
y
-0.20
een
-0.19
ing
-0.19
oxy
-0.18
otic
-0.17
oq
-0.16
eria
-0.16
echn
-0.16
POSITIVE LOGITS
ipeg
0.26
ings
0.25
sylvania
0.23
nn
0.23
ovation
0.22
ibal
0.22
ery
0.22
ounced
0.21
iversary
0.21
egan
0.20
Activations Density 0.030%