INDEX
Explanations
references to recent events or studies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
308
+0.11
0.6%
11
+0.11
0.6%
142
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
308
+0.11
0.02
246
+0.11
0.02
418
+0.10
0.00
Negative Logits
immediately
-1.58
oso
-1.52
await
-1.51
evenly
-1.51
accordingly
-1.51
withstand
-1.50
surround
-1.48
IJ
-1.46
utmost
-1.45
trust
-1.44
POSITIVE LOGITS
generations
1.79
oral
1.68
past
1.66
itations
1.64
decades
1.63
publication
1.62
occasions
1.62
citations
1.61
additions
1.58
comments
1.52
Activations Density 0.057%