INDEX
Explanations
negations
negations or phrases that denote a denial of something
New Auto-Interp
Negative Logits
kamp
-0.76
rift
-0.64
umbn
-0.61
è¦ļéĨĴ
-0.60
papers
-0.58
igans
-0.58
stakes
-0.58
plate
-0.57
ancial
-0.56
iewicz
-0.54
POSITIVE LOGITS
withstanding
1.34
eworthy
1.30
surprisingly
1.23
ably
1.22
icing
1.17
orious
1.15
ices
1.01
everyone
1.01
only
0.97
icably
0.96
Activations Density 0.057%