INDEX
Explanations
names of individuals or characters
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
iculty
-0.79
antry
-0.79
istry
-0.74
hedral
-0.74
itizen
-0.73
imates
-0.71
rants
-0.71
ropy
-0.70
fulness
-0.70
ribute
-0.69
POSITIVE LOGITS
thal
0.81
rities
0.80
ously
0.73
ova
0.69
Rove
0.66
vow
0.66
eers
0.65
neuron
0.64
Osw
0.63
ocal
0.61
Activations Density 0.104%