INDEX
Explanations
terms related to exploration and discovery
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
412
+0.17
1.0%
115
+0.13
0.7%
148
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
412
+0.17
0.06
148
+0.13
0.02
392
+0.12
0.05
Negative Logits
acs
-1.68
oddsidemargin
-1.59
imes
-1.56
hers
-1.55
aho
-1.53
helm
-1.47
ensen
-1.45
arman
-1.44
oxide
-1.42
eless
-1.42
POSITIVE LOGITS
depths
1.74
tain
1.61
unreadable
1.60
feasibility
1.57
tunnels
1.52
TeV
1.50
gaps
1.43
trap
1.41
how
1.39
possibilities
1.38
Activations Density 0.439%