INDEX
Explanations
instances of the term "act" or related variations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
274
+0.12
0.7%
296
+0.11
0.6%
376
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
296
+0.12
0.02
352
+0.11
0.02
264
+0.11
0.02
Negative Logits
>`
-2.10
>';
-1.65
...'
-1.61
acetic
-1.56
omitempty
-1.52
ARC
-1.52
...?"
-1.51
>'
-1.48
've
-1.47
respectively
-1.47
POSITIVE LOGITS
ivities
1.93
uary
1.89
cule
1.88
ivism
1.75
icals
1.68
case
1.67
ivated
1.64
ism
1.62
ical
1.61
uelle
1.60
Activations Density 0.190%