INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.35
1.6%
1120
+0.04
0.2%
1048
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2
-0.35
0.00
0
-0.04
0.00
1
-0.04
0.00
Negative Logits
massacres
-0.86
,
-0.85
atrocities
-0.80
.
-0.79
for
-0.77
bombings
-0.77
не
-0.76
to
-0.76
but
-0.76
also
-0.75
POSITIVE LOGITS
<bos>
10.71
dispen
2.18
expandindo
2.09
GEBURTSDATUM
2.03
ftu
2.01
fta
2.00
!...
1.94
»>
1.94
effe
1.90
embra
1.89
Activations Density 0.000%
No Known Activations
This feature has no known activations.