INDEX
Explanations
repetitive patterns or symbols in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
264
+0.14
0.7%
180
+0.11
0.6%
118
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
53
+0.14
0.44
438
+0.11
0.36
56
+0.11
0.13
Negative Logits
Professor
-1.43
Computing
-1.39
arbitrary
-1.38
Xia
-1.37
Brownian
-1.36
bl
-1.34
nai
-1.30
let
-1.29
caution
-1.27
least
-1.25
POSITIVE LOGITS
ĻĤ
2.57
↵
2.56
↵ âĢĥ
2.56
2.56
2.56
2.56
↵ ↵
2.56
2.56
↵
2.56
č↵
2.56
Activations Density 8.204%