INDEX
Explanations
titles of movies, games, or films
mentions of various types of media, particularly titles of films and games
New Auto-Interp
Negative Logits
procedure
-0.64
uez
-0.62
å§«
-0.61
ction
-0.60
tenance
-0.60
LESS
-0.60
ople
-0.60
osate
-0.59
Reform
-0.59
esville
-0.59
POSITIVE LOGITS
starring
1.00
linger
0.91
featuring
0.90
genres
0.86
titles
0.86
marketed
0.84
uggest
0.83
manship
0.83
paces
0.82
hops
0.81
Activations Density 0.264%