INDEX
Explanations
feeling compelled or pressured
New Auto-Interp
Negative Logits
Being
0.31
étant
0.29
BEING
0.29
Seorang
0.28
sendo
0.27
being
0.27
Being
0.26
beings
0.26
alguien
0.26
泹
0.25
POSITIVE LOGITS
like
0.50
tempted
0.41
compelled
0.38
embarrassed
0.36
ashamed
0.34
torn
0.34
intimidated
0.34
paralyzed
0.34
confused
0.33
choked
0.33
Activations Density 0.009%