INDEX
Explanations
names and personal details in texts
connections between names and personal information
New Auto-Interp
Negative Logits
worms
-0.85
iculture
-0.75
rats
-0.73
Increases
-0.71
drm
-0.70
nw
-0.69
itized
-0.69
hawks
-0.68
ranged
-0.67
bub
-0.66
POSITIVE LOGITS
surname
1.30
nationality
1.26
initials
1.20
likeness
1.19
password
1.16
pronouns
1.09
address
1.08
password
1.07
suffix
1.05
badge
1.02
Activations Density 0.185%