INDEX
Explanations
terms related to systematic approaches
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.15
0.8%
92
+0.11
0.6%
327
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.15
0.02
148
+0.11
0.00
424
+0.11
0.01
Negative Logits
mismo
-1.93
aceae
-1.70
inois
-1.69
skilled
-1.66
behalf
-1.52
pills
-1.46
brave
-1.42
inea
-1.41
owed
-1.40
wegian
-1.40
POSITIVE LOGITS
ignment
1.72
ity
1.57
°
1.57
"}](#
1.50
underest
1.50
ķ
1.47
essment
1.46
itat
1.43
ities
1.42
sheet
1.42
Activations Density 0.147%