INDEX
Explanations
texts related to making promises or commitments
words related to urging, requesting, or promoting actions
New Auto-Interp
Negative Logits
Craw
-0.68
hack
-0.66
æŃ¦
-0.64
Poké
-0.63
Hollow
-0.61
Dortmund
-0.60
Dug
-0.60
Oval
-0.59
Buzz
-0.58
ynam
-0.56
POSITIVE LOGITS
agree
0.84
iment
0.82
beware
0.80
mercy
0.79
haps
0.79
iments
0.79
ingly
0.78
modesty
0.78
soever
0.77
waive
0.77
Activations Density 0.252%