INDEX
Explanations
terms and phrases related to corruption
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.12
0.7%
474
+0.12
0.7%
82
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
82
+0.12
0.02
474
+0.12
0.01
347
+0.12
0.01
Negative Logits
į
-1.96
ĥ½
-1.88
»¿
-1.86
MOESM
-1.86
woke
-1.83
ĺ
-1.79
Į
-1.75
lla
-1.66
Ĺ
-1.64
Ķ
-1.63
POSITIVE LOGITS
schedules
1.86
icum
1.81
tasks
1.81
regimes
1.79
nier
1.77
ically
1.76
limits
1.74
istically
1.69
areas
1.68
ulent
1.68
Activations Density 0.051%