INDEX
Explanations
verbs expressing willingness or agreement
positive emotions and expressions of agreement or satisfaction
New Auto-Interp
Negative Logits
thumbnails
-0.81
Killer
-0.76
ulz
-0.71
verbs
-0.68
ictions
-0.67
GOODMAN
-0.66
holes
-0.65
threats
-0.65
oons
-0.65
elo
-0.64
POSITIVE LOGITS
ãĤ©
0.88
welcomed
0.74
embraced
0.73
atered
0.72
bda
0.72
reunited
0.70
acknowledge
0.69
endorse
0.69
rejoice
0.68
congratulate
0.68
Activations Density 0.035%