INDEX
Explanations
references to historical events and figures, as well as locations and dates
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.7%
1980
+0.07
0.3%
1055
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
882
+0.19
0.04
509
+0.07
0.03
527
+0.06
0.03
Negative Logits
<bos>
-2.72
public
-0.67
ConstraintMaker
-0.66
struct
-0.64
mergeFrom
-0.61
об
-0.60
addComponent
-0.59
earn
-0.59
prepare
-0.59
Autoritní
-0.59
POSITIVE LOGITS
affor
1.77
increa
1.68
wherea
1.66
inev
1.66
reluct
1.63
emphat
1.63
accla
1.62
disagre
1.61
unden
1.61
squa
1.60
Activations Density 0.290%