INDEX
Explanations
mentions of actors in various contexts
references to actors and their roles or performances
New Auto-Interp
Negative Logits
PRESS
-0.68
UTERS
-0.67
Territories
-0.65
yx
-0.62
£ı
-0.59
ggies
-0.59
fty
-0.59
Peg
-0.58
ãĥ¼ãĥ³
-0.58
Truth
-0.58
POSITIVE LOGITS
actors
1.23
rities
1.06
actor
1.06
writers
0.96
actresses
0.92
Actor
0.87
Actor
0.85
acters
0.83
singers
0.81
actress
0.80
Activations Density 0.008%