INDEX
Explanations
text related to business partnerships and formal announcements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.17
0.5%
2034
+0.16
0.5%
1535
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1150
+0.17
0.04
1535
+0.16
0.06
478
+0.16
0.05
Negative Logits
Și
-1.00
Když
-0.98
Zgod
-0.96
Endlich
-0.95
Leurs
-0.94
Selama
-0.92
Przyp
-0.91
Dijo
-0.90
Díky
-0.86
Czym
-0.86
POSITIVE LOGITS
<eos>
1.25
<bos>
1.06
↵↵↵
0.81
↵↵
0.77
↵↵↵↵
0.75
↵↵↵↵↵
0.71
↵↵↵↵↵↵↵
0.64
↵↵↵↵↵↵
0.63
↵↵↵↵↵↵↵↵
0.63
↵
0.62
Activations Density 0.511%