INDEX
Explanations
proper nouns, particularly names
references to specific individuals and their characteristics
New Auto-Interp
Negative Logits
cknow
-0.84
estinal
-0.83
innie
-0.80
ufact
-0.78
Lumpur
-0.71
ishers
-0.69
izontal
-0.65
pez
-0.65
archy
-0.65
quished
-0.65
POSITIVE LOGITS
erous
0.79
hurst
0.75
ylon
0.69
crow
0.67
xus
0.66
stick
0.66
son
0.62
Stain
0.62
Crawford
0.62
Rica
0.61
Activations Density 0.036%