INDEX
Explanations
phrases related to the composition or content of a group or system
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
783
+0.07
0.3%
313
+0.07
0.3%
1047
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
206
+0.07
0.03
861
+0.07
0.02
261
+0.07
0.02
Negative Logits
<bos>
-1.19
public
-0.84
ตร์
-0.76
private
-0.76
return
-0.74
@
-0.74
Referències
-0.74
util
-0.72
lib
-0.72
int
-0.72
POSITIVE LOGITS
maneu
2.52
reluct
2.49
fortn
2.45
shenan
2.44
increa
2.43
depic
2.37
strick
2.37
wherea
2.34
accla
2.34
affor
2.33
Activations Density 0.077%