INDEX
Explanations
research-related terms and concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1241
+0.08
0.2%
872
+0.08
0.2%
1129
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
378
+0.08
0.04
581
+0.08
0.05
1241
+0.07
0.05
Negative Logits
Mlle
-0.96
encomp
-0.94
indestru
-0.93
fortn
-0.93
accla
-0.88
inconce
-0.88
increa
-0.87
unve
-0.84
philanth
-0.83
gardent
-0.82
POSITIVE LOGITS
InjectAttribute
0.62
principalTable
0.59
KEYCODE
0.58
censiti
0.56
MessageOf
0.56
currently
0.56
really
0.56
adequately
0.55
nor
0.55
truly
0.54
Activations Density 0.489%