INDEX
Explanations
proper nouns, particularly names and significant locations
New Auto-Interp
Negative Logits
ew
-0.22
tered
-0.19
ively
-0.19
irma
-0.17
tle
-0.17
-0.17
erson
-0.17
readonly
-0.16
erc
-0.16
rik
-0.16
POSITIVE LOGITS
leans
0.24
naments
0.24
iginal
0.23
ãģ¹ãģį
0.22
izont
0.22
chestra
0.20
outines
0.20
tega
0.20
ourke
0.19
angep
0.19
Activations Density 0.211%