INDEX
Explanations
names or parts of names related to individuals
proper nouns, particularly names
New Auto-Interp
Negative Logits
sburg
-0.82
wagen
-0.78
e
-0.77
sylvania
-0.75
=-=-=-=-
-0.75
oÄŁ
-0.67
eric
-0.66
edly
-0.65
eur
-0.63
er
-0.63
POSITIVE LOGITS
plin
1.03
unta
0.94
ihara
0.82
uchi
0.80
ihad
0.79
rahim
0.78
wered
0.77
linger
0.75
umen
0.75
combe
0.74
Activations Density 0.094%