INDEX
Explanations
adjectives related to characteristics of behavior or actions
terms related to fairness, impartiality, and complicity
New Auto-Interp
Negative Logits
RAM
-0.73
bum
-0.72
Jensen
-0.67
\\\\\\\\\\\\\\\\
-0.67
trap
-0.66
clamation
-0.65
Grow
-0.65
hib
-0.64
reat
-0.63
Bray
-0.62
POSITIVE LOGITS
impartial
0.88
seys
0.88
ity
0.86
complicity
0.79
Cosponsors
0.77
complicit
0.76
itous
0.74
spectator
0.72
postage
0.69
ences
0.69
Activations Density 0.023%