INDEX
Explanations
expressions of gratitude or requests for assistance
expressions of desire or preference
New Auto-Interp
Negative Logits
VERTISEMENT
-0.79
ccording
-0.63
angular
-0.63
onut
-0.62
arding
-0.59
ulty
-0.58
Hazard
-0.58
ilian
-0.57
icol
-0.56
ious
-0.56
POSITIVE LOGITS
to
0.71
clarification
0.70
revenge
0.69
lier
0.67
assurances
0.67
lihood
0.63
ANY
0.60
replicate
0.59
ĸļ
0.59
someone
0.58
Activations Density 0.033%