INDEX
Explanations
proper names or nouns used to describe individuals
proper nouns, specifically people's names
New Auto-Interp
Negative Logits
pron
-0.67
uay
-0.66
taboola
-0.64
mble
-0.64
uminati
-0.60
berra
-0.59
į
-0.58
âĢº
-0.57
abwe
-0.57
ymm
-0.57
POSITIVE LOGITS
's
0.70
kson
0.65
herself
0.65
wore
0.62
swore
0.60
saw
0.59
Introduced
0.58
himself
0.57
realised
0.57
owan
0.57
Activations Density 0.249%