INDEX
Explanations
proper nouns, particularly names associated with significant figures or characters
New Auto-Interp
Negative Logits
yg
-0.16
nes
-0.15
ilt
-0.15
lob
-0.15
nÃŃ
-0.14
tos
-0.14
antha
-0.14
rig
-0.14
ren
-0.13
tas
-0.13
POSITIVE LOGITS
amine
0.18
James
0.17
James
0.17
eger
0.17
Colin
0.15
amines
0.15
noinspection
0.15
McCartney
0.15
ünkü
0.14
james
0.14
Activations Density 0.006%