INDEX
Explanations
phrases emphasizing rankings or comparisons regarding films or media
New Auto-Interp
Negative Logits
eczy
-0.15
arnation
-0.15
fiss
-0.14
że
-0.14
yn
-0.14
oss
-0.14
edn
-0.14
ood
-0.14
Ross
-0.14
readcr
-0.13
POSITIVE LOGITS
note
0.17
ever
0.17
bunch
0.16
EVER
0.16
aucoup
0.15
any
0.15
ezi
0.15
recent
0.15
illac
0.15
consequence
0.14
Activations Density 0.025%