INDEX
Explanations
webpage or post metadata and introductory information like dates, author names, and introductory paragraphs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.44
11.4%
1741
+0.05
1.4%
876
+0.03
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.44
0.21
108
+0.05
0.55
1523
+0.03
0.40
Negative Logits
massacres
-0.97
corruption
-0.87
atrocities
-0.86
bombings
-0.86
insurgents
-0.85
disinformation
-0.84
insurgency
-0.84
ABORT
-0.82
ायर
-0.81
fascism
-0.80
POSITIVE LOGITS
<bos>
16.51
expandindo
2.52
GEBURTSDATUM
2.46
encomp
2.42
intersper
2.39
betweenstory
2.36
Autoritní
2.32
dispen
2.31
fuf
2.27
Administrativna
2.27
Activations Density 0.785%