INDEX
Explanations
words that imply a negative or dishonorable action or attribute
words related to dishonesty or dishonorable behavior
references to dishonesty and honorable actions, including high-stakes situations and individuals associated with them
New Auto-Interp
Negative Logits
esta
-0.95
agy
-0.82
isine
-0.81
izens
-0.80
urgy
-0.80
rooms
-0.76
heast
-0.74
ele
-0.74
arian
-0.73
icles
-0.72
POSITIVE LOGITS
RECT
0.79
CLAIM
0.77
é»Ĵ
0.75
POSE
0.75
BuyableInstoreAndOnline
0.75
åĬ
0.71
ments
0.70
omission
0.69
\\\\\\\\\\\\\\\\
0.69
OUP
0.67
Activations Density 0.033%