INDEX
Explanations
references to the concept of the Devil
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
68
+0.14
0.8%
370
+0.13
0.7%
687
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.14
0.06
1741
+0.13
0.00
1870
+0.13
0.02
Negative Logits
<bos>
-1.54
Примеча
-0.68
RunWith
-0.67
Asoci
-0.66
Hå
-0.65
Ingredi
-0.65
Continu
-0.65
Enllaces
-0.64
elemField
-0.64
ുറ
-0.64
POSITIVE LOGITS
Montagne
1.11
.-"
0.98
?...
0.95
miu
0.95
!'
0.95
!...
0.94
nant
0.94
salat
0.93
lii
0.93
frankfurt
0.93
Activations Density 0.397%