INDEX
Explanations
phrases indicating an increase or emphasis on quantity
New Auto-Interp
Negative Logits
vern
-0.18
uations
-0.16
imes
-0.15
atat
-0.15
ÄįÃŃ
-0.15
icable
-0.14
astro
-0.14
edn
-0.14
ê·¹
-0.14
woke
-0.14
POSITIVE LOGITS
time
0.26
several
0.24
course
0.22
ha
0.21
the
0.19
multiple
0.18
roughly
0.17
ture
0.17
many
0.17
views
0.17
Activations Density 0.029%