INDEX
Explanations
expressions of willingness or readiness
expressions of positive emotions or sentiments
New Auto-Interp
Negative Logits
thumbnails
-0.85
Killer
-0.68
vocabulary
-0.66
GOODMAN
-0.65
causation
-0.64
verbs
-0.64
restraint
-0.64
causal
-0.64
onal
-0.63
specifics
-0.63
POSITIVE LOGITS
ãĤ©
0.91
welcomed
0.90
embraced
0.88
accepted
0.83
endorse
0.83
greeted
0.81
reunited
0.79
endorsed
0.79
entertained
0.77
honoured
0.76
Activations Density 0.071%