INDEX
Explanations
shock, surprise, and distress
New Auto-Interp
Negative Logits
duyệt
0.28
reworking
0.27
표준
0.27
逛
0.27
ኖር
0.27
访问
0.26
谈
0.26
прогу
0.26
享受
0.26
ngày
0.26
POSITIVE LOGITS
screamed
0.64
gasped
0.60
screams
0.58
scream
0.57
frantically
0.56
yelled
0.53
cried
0.52
trembled
0.52
screaming
0.52
yells
0.50
Activations Density 0.070%