INDEX
Explanations
adverbs and adjectives that express critique or judgment
New Auto-Interp
Head Attr Weights
0:0.11
1:0.05
2:0.07
3:0.24
4:0.02
5:0.04
6:0.02
7:0.09
8:0.04
9:0.02
10:0.21
11:0.04
Negative Logits
ecause
-2.34
Semitism
-2.27
rients
-2.21
ername
-2.18
quickShipAvailable
-2.14
>.
-2.07
ributes
-2.06
rities
-2.06
deen
-2.04
Shares
-2.04
POSITIVE LOGITS
improbable
2.70
unconventional
2.62
unlikely
2.59
complex
2.54
ingenious
2.25
labyrinth
2.24
elusive
2.20
seldom
2.19
questionable
2.18
convoluted
2.18
Activations Density 0.072%