INDEX
Explanations
calls to action or instructions for contributing or supporting
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.33
2.1%
1120
+0.07
0.5%
478
+0.06
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1403
+0.33
0.23
800
+0.07
0.27
15
+0.06
0.26
Negative Logits
belliger
-1.59
ruinous
-1.49
despotism
-1.47
unspeak
-1.43
ineffectual
-1.41
demoral
-1.40
miscon
-1.38
massacres
-1.37
exasper
-1.32
nukes
-1.32
POSITIVE LOGITS
<bos>
13.24
GEBURTSDATUM
2.43
expandindo
2.33
betweenstory
2.28
Autoritní
2.24
تقاوى
1.95
kasarigan
1.93
kaarangay
1.85
'\\;'
1.84
Italijani
1.82
Activations Density 0.086%