INDEX
Explanations
phrases expressing preference or opposition
negations or phrases emphasizing the word "not."
New Auto-Interp
Negative Logits
eur
-0.80
velt
-0.72
kamp
-0.69
ction
-0.68
itor
-0.66
ixel
-0.65
ç·
-0.63
Mehran
-0.63
riber
-0.62
lance
-0.62
POSITIVE LOGITS
necessarily
1.40
icably
1.20
epad
1.09
icable
0.98
etheless
0.97
withstanding
0.94
bothering
0.92
orious
0.88
remotely
0.85
eworthy
0.81
Activations Density 0.091%