INDEX
Explanations
phrases indicating quantity or extent
New Auto-Interp
Negative Logits
omik
-0.16
VARIABLE
-0.14
modity
-0.13
aits
-0.13
oret
-0.13
cot
-0.13
culated
-0.13
alon
-0.13
uron
-0.13
faq
-0.12
POSITIVE LOGITS
activity
0.29
talk
0.27
discussion
0.26
attention
0.26
emphasis
0.24
debate
0.24
activity
0.23
ink
0.22
effort
0.22
hoop
0.22
Activations Density 0.125%