INDEX
Explanations
references to the concept of the universe
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
687
+0.18
0.8%
1602
+0.12
0.5%
555
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.18
0.02
1602
+0.12
0.02
687
+0.12
0.02
Negative Logits
<bos>
-2.14
żdy
-0.68
reaff
-0.62
renounced
-0.56
sell
-0.56
familiarize
-0.56
inaugurate
-0.55
sharpen
-0.54
manage
-0.54
retorted
-0.54
POSITIVE LOGITS
universe
1.17
universe
1.10
Universe
1.06
Universe
1.04
rispond
0.94
kram
0.92
Lombar
0.92
vespa
0.91
Karang
0.90
alpes
0.89
Activations Density 0.207%