INDEX
Explanations
elements that involve comparisons and contrasts, particularly in movie reviews
New Auto-Interp
Negative Logits
ä½ľåĵģ
-0.16
nodoc
-0.15
antage
-0.15
cestor
-0.15
encer
-0.14
jed
-0.14
ABCDEFG
-0.14
prung
-0.14
/operator
-0.14
Albums
-0.14
POSITIVE LOGITS
characters
0.23
cast
0.23
female
0.22
secondary
0.20
character
0.19
male
0.18
plot
0.18
female
0.17
central
0.17
plot
0.17
Activations Density 0.257%