INDEX
Explanations
references to responses and comments, particularly in context to public or official statements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.9%
1013
+0.12
0.5%
1150
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
509
+0.23
0.07
1013
+0.12
0.08
284
+0.11
0.06
Negative Logits
<bos>
-2.63
?...
-1.09
!...
-1.03
encre
-0.97
fuf
-0.93
desir
-0.91
intersper
-0.91
emphat
-0.90
embra
-0.89
!?
-0.86
POSITIVE LOGITS
except
0.79
:"-
0.74
unless
0.70
nor
0.69
except
0.69
anymore
0.65
JSONException
0.64
Neither
0.63
Except
0.63
Except
0.63
Activations Density 0.511%