INDEX
Explanations
phrases that express disappointment or critique of movies
New Auto-Interp
Negative Logits
mistr
-0.16
owers
-0.15
ulary
-0.14
olla
-0.14
aptic
-0.14
illard
-0.14
pheric
-0.13
inand
-0.13
cant
-0.13
Canon
-0.13
POSITIVE LOGITS
moderate
0.21
лаж
0.19
manageable
0.18
harmless
0.18
worst
0.17
merely
0.17
worse
0.17
relatively
0.17
Moderate
0.17
mild
0.16
Activations Density 0.356%