INDEX
Explanations
emotional reactions and states
New Auto-Interp
Negative Logits
ặp
0.95
controversial
0.81
contentious
0.78
scary
0.75
irstyle
0.75
intimidating
0.72
افه
0.72
erred
0.71
Scary
0.70
memungkinkan
0.70
POSITIVE LOGITS
ingly
1.08
不已
0.99
watching
0.98
스러운
0.93
Watching
0.92
Knowing
0.89
Watching
0.89
speechless
0.89
knowing
0.88
thoughts
0.88
Activations Density 0.294%