INDEX
Explanations
phrases indicating desire or preference
expressions of desire or need
New Auto-Interp
Negative Logits
Notting
-0.66
ohyd
-0.65
osterone
-0.65
idelines
-0.63
iel
-0.63
estinal
-0.62
strate
-0.62
wald
-0.62
ormonal
-0.61
roach
-0.60
POSITIVE LOGITS
sake
0.82
forgiveness
0.73
realism
0.71
revenge
0.67
apocalypse
0.65
attention
0.64
panties
0.64
clarity
0.64
haircut
0.63
daddy
0.63
Activations Density 0.128%