INDEX
Explanations
text related to website organization and documentation structure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1624
+0.13
0.5%
410
+0.11
0.4%
411
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1624
+0.13
0.02
411
+0.11
0.02
1950
+0.11
0.02
Negative Logits
Odpo
-0.52
Leia
-0.49
Agnew
-0.49
Pře
-0.43
zurück
-0.43
Finlay
-0.42
Broughton
-0.42
Grenada
-0.41
Terug
-0.41
tarjetas
-0.41
POSITIVE LOGITS
ļ
0.78
',{0.78
tanong
0.75
->{0.74
makita
0.73
':{0.72
>{0.72
",{0.72
guma
0.71
silang
0.71
Activations Density 0.070%