INDEX
Explanations
documents related to diverse news topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.46
5.3%
1741
+0.07
0.8%
381
+0.04
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1403
+0.46
0.24
15
+0.07
0.26
674
+0.04
0.25
Negative Logits
unlaw
-1.65
belliger
-1.56
disgra
-1.54
despotism
-1.50
unspeak
-1.47
impractica
-1.47
ruinous
-1.38
ineffec
-1.33
nukes
-1.31
reluct
-1.29
POSITIVE LOGITS
<bos>
14.11
expandindo
2.33
GEBURTSDATUM
2.33
betweenstory
2.28
Autoritní
2.28
تقاوى
2.04
Administrativna
1.97
kasarigan
1.96
Italijani
1.93
autorytatywna
1.90
Activations Density 0.085%