INDEX
Explanations
phrases related to significant consequences or outcomes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.21
1.2%
321
+0.15
0.9%
376
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.21
0.01
111
+0.15
0.01
321
+0.14
0.01
Negative Logits
mente
-2.27
respectively
-1.90
matter
-1.77
etc
-1.72
ize
-1.62
age
-1.51
iverse
-1.49
noreply
-1.48
izes
-1.45
izer
-1.43
POSITIVE LOGITS
burst
1.63
remarks
1.62
heap
1.46
cke
1.45
odem
1.44
posted
1.43
lay
1.43
conjecture
1.43
brahim
1.43
quire
1.41
Activations Density 0.010%