INDEX
Explanations
mentions of religion, peace, fighters, shows, and guests
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
455
+0.09
0.2%
1473
+0.07
0.2%
1531
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
455
+0.09
0.02
1658
+0.07
0.03
1702
+0.07
0.03
Negative Logits
secon
-1.96
squa
-1.93
fte
-1.88
effe
-1.86
oner
-1.85
fta
-1.84
fup
-1.83
mef
-1.83
increa
-1.76
wien
-1.75
POSITIVE LOGITS
would
0.85
must
0.81
wouldn
0.80
will
0.79
should
0.78
don
0.74
always
0.72
can
0.72
doesn
0.71
cannot
0.70
Activations Density 0.248%