INDEX
Explanations
phrases indicating negation or denial
New Auto-Interp
Negative Logits
üb
-0.19
ÑĥÑī
-0.17
ouser
-0.16
ê¼
-0.14
936
-0.14
claimer
-0.14
ÑĩаÑģно
-0.14
pcodes
-0.14
927
-0.14
emento
-0.14
POSITIVE LOGITS
overnight
0.26
static
0.23
overn
0.21
rocket
0.21
Overnight
0.20
exclusive
0.20
isolated
0.19
linear
0.19
rocket
0.18
magic
0.18
Activations Density 0.179%