INDEX
Explanations
pronoun followed by action verb
New Auto-Interp
Negative Logits
हाद
0.69
cand
0.69
忐
0.66
milij
0.66
besø
0.66
敏感
0.66
malaise
0.66
葺
0.64
honeymoon
0.64
快適
0.64
POSITIVE LOGITS
lung
1.33
roared
1.16
unleashing
1.08
roar
1.02
hurled
1.00
screamed
1.00
Lung
0.96
roaring
0.96
shouted
0.96
snar
0.95
Activations Density 0.132%