INDEX
Explanations
phrases contrasting different perspectives or approaches
phrases or constructs that indicate negation or the idea of "not."
New Auto-Interp
Negative Logits
eur
-0.74
kamp
-0.72
stakes
-0.69
ction
-0.68
velt
-0.64
oise
-0.63
itor
-0.63
hower
-0.62
WER
-0.62
itiz
-0.61
POSITIVE LOGITS
necessarily
1.39
icably
1.33
epad
1.14
icable
1.06
withstanding
0.97
orious
0.96
bothering
0.91
etheless
0.90
eworthy
0.85
exactly
0.84
Activations Density 0.144%