INDEX
Explanations
terms related to processes of emitting or producing something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.23
1.3%
156
+0.20
1.2%
148
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
11
+0.23
0.01
308
+0.20
0.01
463
+0.15
0.01
Negative Logits
ydr
-1.66
ours
-1.34
ress
-1.33
rained
-1.30
going
-1.29
resses
-1.28
dorff
-1.28
iele
-1.27
brewing
-1.27
dered
-1.26
POSITIVE LOGITS
puff
1.51
wing
1.50
presum
1.41
gun
1.37
minute
1.34
vest
1.32
buzz
1.32
vital
1.29
obstruction
1.28
signal
1.25
Activations Density 0.006%