INDEX
Explanations
content with a mix of personal reflections and specific details
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.9%
421
+0.13
0.8%
1127
+0.12
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
421
+0.14
0.02
1127
+0.13
0.02
2034
+0.12
0.02
Negative Logits
<bos>
-2.90
ⓧ
-0.95
<?
-0.83
/**
-0.78
/***
-0.67
Сол
-0.65
Normdatei
-0.62
springfox
-0.62
bezeichneter
-0.61
Высота
-0.60
POSITIVE LOGITS
'..
0.92
velours
0.89
">...
0.88
)..
0.87
'..
0.85
véhic
0.84
..
0.83
"..
0.82
maroc
0.80
impractica
0.80
Activations Density 0.079%