INDEX
Explanations
parentheses and their corresponding closing brackets
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
470
+0.17
0.9%
190
+0.14
0.8%
139
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
250
+0.17
0.20
412
+0.14
0.15
258
+0.14
0.14
Negative Logits
holders
-1.63
ños
-1.59
mable
-1.49
survives
-1.48
bys
-1.47
yel
-1.42
yll
-1.36
alive
-1.32
gers
-1.32
salv
-1.32
POSITIVE LOGITS
nai
2.03
eness
1.74
myself
1.64
ubicin
1.51
iance
1.43
other
1.38
sponsored
1.38
isco
1.30
support
1.29
others
1.28
Activations Density 0.325%