INDEX
Explanations
proper nouns, specifically names of individuals, likely related to news or entertainment
New Auto-Interp
Negative Logits
xual
-0.94
ensical
-0.93
acebook
-0.79
glim
-0.78
yrinth
-0.76
ibilities
-0.75
disadvant
-0.75
rival
-0.74
nces
-0.74
joined
-0.73
POSITIVE LOGITS
Slater
0.89
Anne
0.89
Sue
0.83
town
0.81
eland
0.79
Clarkson
0.75
Loll
0.73
terson
0.72
edge
0.71
sey
0.70
Activations Density 8.291%