INDEX
Explanations
words related to fear and terror
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.10
0.3%
455
+0.09
0.3%
1960
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1553
+0.10
0.06
1960
+0.09
0.03
225
+0.09
0.03
Negative Logits
fta
-1.03
fign
-1.01
ftu
-1.00
NOO
-0.97
poff
-0.96
fup
-0.91
proprement
-0.91
paff
-0.91
unil
-0.91
aen
-0.90
POSITIVE LOGITS
fear
1.00
afraid
0.96
fearing
0.93
fearful
0.88
fraid
0.84
fears
0.81
feared
0.81
scared
0.80
Fear
0.77
Fear
0.77
Activations Density 0.553%