INDEX
Explanations
words related to labeling and mislabeling in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1271
+0.19
0.7%
1272
+0.13
0.5%
1895
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1271
+0.19
0.03
1272
+0.13
0.02
1622
+0.12
0.02
Negative Logits
Méri
-0.57
récomp
-0.52
undai
-0.48
viu
-0.48
OTTO
-0.47
essandro
-0.47
dirond
-0.47
Dorothea
-0.47
Piac
-0.46
réun
-0.45
POSITIVE LOGITS
label
1.56
labels
1.49
label
1.39
Label
1.38
Labels
1.33
labels
1.31
Label
1.29
LABEL
1.28
labeling
1.26
LABEL
1.22
Activations Density 0.078%