INDEX
Explanations
the beginning of textual content ("<bos>")
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.44
12.6%
1741
+0.05
1.5%
381
+0.03
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.44
0.21
108
+0.05
0.57
1523
+0.03
0.42
Negative Logits
massacres
-0.97
corruption
-0.86
atrocities
-0.86
bombings
-0.86
disinformation
-0.85
insurgents
-0.85
insurgency
-0.84
ABORT
-0.82
fascism
-0.81
ायर
-0.80
POSITIVE LOGITS
<bos>
16.55
expandindo
2.53
GEBURTSDATUM
2.48
betweenstory
2.38
encomp
2.34
Autoritní
2.33
intersper
2.31
Administrativna
2.28
تقاوى
2.27
dispen
2.26
Activations Density 0.956%