INDEX
Explanations
directions or steps related to a specific process or task
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.4%
874
+0.07
0.3%
169
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
874
+0.10
0.04
451
+0.07
0.04
1436
+0.06
0.04
Negative Logits
<bos>
-1.39
<?
-0.81
ⓧ
-0.77
<?
-0.75
-0.75
public
-0.72
protected
-0.68
-0.67
/**
-0.67
ൊ
-0.66
POSITIVE LOGITS
affor
2.09
maneu
1.99
accla
1.93
increa
1.88
disagre
1.85
impra
1.83
Plan
1.81
Juf
1.81
wien
1.78
excru
1.77
Activations Density 0.107%