INDEX
Explanations
instances of the word "rely."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
32
+0.14
0.8%
184
+0.12
0.6%
450
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
450
+0.14
0.03
32
+0.12
0.03
386
+0.11
0.01
Negative Logits
ĨĴ
-2.69
Ļª
-2.38
²
-2.36
»¿
-2.31
Ĩ
-2.30
-2.29
-2.29
č↵č↵
-2.29
-2.29
↵
-2.29
POSITIVE LOGITS
riers
1.65
extrinsic
1.60
lement
1.52
inductive
1.51
partly
1.50
ESULT
1.48
inert
1.48
ries
1.47
ichte
1.45
coat
1.43
Activations Density 0.293%