INDEX
Explanations
expressions of fear and anxiety
fear and being scared
New Auto-Interp
Negative Logits
httphttps
-0.69
<unused51>
-0.60
<pad>
-0.60
[@BOS@]
-0.60
<unused47>
-0.60
<unused52>
-0.60
<unused41>
-0.60
<unused17>
-0.60
<unused8>
-0.60
<unused14>
-0.60
POSITIVE LOGITS
fear
1.02
fear
0.94
terrified
0.93
scared
0.93
frightened
0.90
scared
0.85
Fear
0.85
Fear
0.82
afraid
0.81
peur
0.80
Activations Density 0.039%