INDEX
Explanations
names of specific individuals or celebrities
names of individuals associated with film or entertainment
New Auto-Interp
Negative Logits
bers
-1.23
dro
-0.80
bled
-0.78
dropping
-0.78
ãĤº
-0.73
bling
-0.72
behind
-0.72
blers
-0.72
bered
-0.71
govtrack
-0.70
POSITIVE LOGITS
esson
0.89
atan
0.88
terness
0.77
Sheen
0.77
agra
0.76
arians
0.75
stellar
0.71
alien
0.70
aples
0.69
naires
0.68
Activations Density 0.032%