INDEX
Explanations
timestamps at the beginning of lines or phrases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.28
5.8%
1741
+0.05
1.1%
50
+0.05
1.0%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.28
0.12
1713
+0.05
0.75
108
+0.05
0.57
Negative Logits
massacres
-1.07
disinformation
-0.98
atrocities
-0.96
bombings
-0.94
insurgency
-0.93
fascism
-0.92
insurgents
-0.92
mismanagement
-0.92
bloodshed
-0.91
traitors
-0.90
POSITIVE LOGITS
<bos>
16.99
expandindo
2.81
GEBURTSDATUM
2.78
betweenstory
2.68
Administrativna
2.58
Autoritní
2.57
dispen
2.56
تقاوى
2.54
Италијани
2.38
Italijani
2.37
Activations Density 0.958%