INDEX
Explanations
names, particularly first names of individuals
New Auto-Interp
Negative Logits
Lamar
-0.76
Newt
-0.68
José
-0.67
Reverend
-0.67
Buddy
-0.67
Shaun
-0.66
Enrique
-0.65
Pixie
-0.64
Dwight
-0.64
Weird
-0.62
POSITIVE LOGITS
ovich
0.98
acci
0.95
otte
0.94
ovsky
0.94
opoulos
0.93
enstein
0.92
insky
0.92
shaw
0.92
atos
0.91
fman
0.91
Activations Density 0.053%