INDEX
Explanations
expressions of contrast or negation
negations or phrases expressing a lack of something
New Auto-Interp
Negative Logits
former
-0.78
quet
-0.75
arling
-0.72
riber
-0.69
onic
-0.66
umbn
-0.66
ourses
-0.66
papers
-0.64
interstitial
-0.64
onding
-0.64
POSITIVE LOGITS
uncommon
1.35
surprising
1.16
unreasonable
1.16
clear
1.09
icable
1.06
unusual
1.04
unheard
1.04
incon
1.04
coincidence
1.00
advisable
0.96
Activations Density 0.083%