INDEX
Explanations
phrases related to expressing approval or disapproval
expressions of approval or praise
New Auto-Interp
Negative Logits
dunno
-0.82
crap
-0.73
nodd
-0.72
shit
-0.69
booze
-0.67
sort
-0.66
lol
-0.65
gul
-0.65
forgot
-0.64
?).
-0.64
POSITIVE LOGITS
Cosponsors
0.89
unres
0.77
Members
0.71
strongly
0.71
ourselves
0.71
assurances
0.70
Chancellor
0.69
orough
0.67
Secretary
0.67
unequivocally
0.65
Activations Density 0.217%