INDEX
Explanations
references to actors
references to actors and actresses
New Auto-Interp
Negative Logits
£ı
-0.83
vironment
-0.72
ciating
-0.69
outine
-0.68
plings
-0.68
wart
-0.68
yss
-0.63
yip
-0.63
elsius
-0.63
ļé
-0.63
POSITIVE LOGITS
rities
0.95
portraying
0.89
actor
0.82
writers
0.81
acters
0.80
Natalie
0.79
actress
0.76
actresses
0.72
plays
0.70
duo
0.69
Activations Density 0.031%