INDEX
Explanations
sentences with evaluative language or expressions of opinion regarding various subjects
New Auto-Interp
Negative Logits
either
-0.25
Either
-0.22
instead
-0.21
even
-0.21
either
-0.20
竣
-0.20
simply
-0.19
Either
-0.19
instead
-0.19
einfach
-0.18
POSITIVE LOGITS
certainly
0.28
initially
0.28
technically
0.26
nomin
0.25
may
0.24
Certainly
0.22
may
0.21
Initially
0.21
occasionally
0.20
initial
0.20
Activations Density 0.394%