INDEX
Explanations
names of famous actors and celebrities
names of prominent individuals, particularly actors and their achievements
New Auto-Interp
Negative Logits
sic
-0.93
)."
-0.73
'."
-0.68
.""
-0.66
âĵĺ
-0.66
.).
-0.65
Ire
-0.65
.")
-0.65
.'"
-0.59
espie
-0.58
POSITIVE LOGITS
Doesn
0.63
Canaver
0.63
Aren
0.60
lishes
0.60
Isn
0.60
Everyday
0.59
Groups
0.59
'?
0.58
Influence
0.58
LIMITED
0.57
Activations Density 0.737%