INDEX
Explanations
references to specific file paths or structures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.26
1.5%
369
+0.15
0.9%
466
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
17
+0.26
0.00
73
+0.15
0.01
369
+0.15
0.01
Negative Logits
opters
-1.61
.]{}-1.56
uro
-1.52
ONT
-1.46
romycin
-1.46
Algorithm
-1.43
udson
-1.41
ousseau
-1.39
ruitment
-1.36
PLIED
-1.35
POSITIVE LOGITS
soda
1.67
rell
1.58
icas
1.56
hammer
1.56
dom
1.55
bar
1.54
leg
1.53
cigar
1.47
bell
1.45
fight
1.44
Activations Density 0.114%