INDEX
Explanations
movie titles, specifically focusing on the presence of the word "starring"
references to actors and their roles in films
New Auto-Interp
Negative Logits
alde
-0.70
EAR
-0.68
upon
-0.63
individual
-0.62
oat
-0.62
IB
-0.61
instr
-0.61
abis
-0.60
XT
-0.60
individuals
-0.59
POSITIVE LOGITS
starring
3.74
starred
1.65
featuring
1.52
cameo
1.13
stars
1.09
Featuring
1.09
showcasing
1.07
portraying
1.05
depicting
1.04
stars
1.04
Activations Density 0.014%