INDEX
Explanations
**verbs related to discovery or formation**
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
549
+0.10
0.3%
1413
+0.10
0.3%
1110
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1413
+0.10
0.03
549
+0.10
0.04
559
+0.09
0.04
Negative Logits
depic
-1.93
reluct
-1.89
increa
-1.88
encomp
-1.84
impra
-1.81
shenan
-1.80
maneu
-1.77
unve
-1.75
indestru
-1.75
inev
-1.73
POSITIVE LOGITS
find
1.01
find
1.00
finds
0.88
found
0.87
Find
0.84
FIND
0.84
Find
0.82
found
0.80
finding
0.79
finder
0.77
Activations Density 0.221%