INDEX
Explanations
references to movies or actors
instances of the word "starring" in relation to movies and performances
New Auto-Interp
Negative Logits
utral
-0.85
abus
-0.77
regulated
-0.76
Ĥİ
-0.74
apa
-0.73
veyard
-0.72
nea
-0.72
oard
-0.71
nsic
-0.71
adem
-0.71
POSITIVE LOGITS
starring
1.19
Dust
0.78
stars
0.77
ãĤ¤ãĥĪ
0.76
Credits
0.76
Actress
0.75
Pengu
0.75
starred
0.75
stars
0.71
Features
0.71
Activations Density 0.008%