INDEX
Explanations
instances of addressing individuals or referring to personal actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.31
1.2%
1757
+0.10
0.4%
2019
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1757
+0.31
0.05
946
+0.10
0.03
814
+0.09
0.03
Negative Logits
<bos>
-2.19
<?
-0.87
<?
-0.86
ⓧ
-0.81
-0.78
/***
-0.72
/**
-0.60
disbur
-0.58
})();
-0.58
/**
-0.57
POSITIVE LOGITS
véhic
1.09
soulign
1.03
Juf
1.02
Minang
0.96
pollut
0.93
délib
0.93
quoique
0.93
Jambi
0.92
Khart
0.91
déliv
0.91
Activations Density 0.164%