INDEX
Explanations
conjunctions indicating alternatives or options
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.5%
776
+0.05
0.2%
1363
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1704
+0.14
0.03
749
+0.05
0.02
490
+0.05
0.03
Negative Logits
<bos>
-2.47
<?
-0.72
ोंने
-0.71
/**
-0.70
-0.69
GetMapping
-0.68
########.
-0.68
began
-0.67
createState
-0.67
ⓧ
-0.67
POSITIVE LOGITS
maneu
2.19
stockholm
2.09
affor
2.04
wien
2.04
accla
2.00
increa
1.99
disagre
1.97
impra
1.94
emphat
1.94
inev
1.93
Activations Density 0.077%