INDEX
Explanations
negations or exceptions
phrases indicating negation or the absence of something
New Auto-Interp
Negative Logits
ixel
-0.76
tein
-0.72
rift
-0.68
tty
-0.67
creen
-0.66
velt
-0.66
istle
-0.64
stone
-0.64
Run
-0.64
arten
-0.64
POSITIVE LOGITS
necessarily
1.31
icable
1.19
icably
1.13
etheless
1.04
epad
1.04
eworthy
1.02
withstanding
0.97
bothering
0.77
exactly
0.77
bothered
0.77
Activations Density 0.044%