INDEX
Explanations
film-related words, possibly related to reviews or evaluations
references to films and movies
New Auto-Interp
Negative Logits
condition
-0.65
wheelchair
-0.62
bluff
-0.62
ridge
-0.60
stone
-0.60
LESS
-0.59
Islanders
-0.59
FUL
-0.59
mechanism
-0.59
nurse
-0.58
POSITIVE LOGITS
ynthesis
1.03
earch
0.96
ovies
0.95
uggest
0.89
ystem
0.88
chool
0.87
ensitive
0.87
cape
0.86
aurus
0.85
starring
0.84
Activations Density 0.049%