INDEX
Explanations
expressions related to personal experiences and reflections
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
24
+0.14
0.8%
104
+0.13
0.7%
469
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
24
+0.14
0.12
331
+0.13
0.10
299
+0.13
0.09
Negative Logits
Ļª
-3.22
ĨĴ
-3.13
ĸ
-3.00
ĩ
-2.90
Ń
-2.88
↵
-2.84
-2.84
-2.84
-2.84
↵
-2.84
POSITIVE LOGITS
nai
1.59
really
1.51
ipper
1.42
subscrib
1.42
apan
1.41
symbol
1.36
treasure
1.31
had
1.29
ded
1.29
might
1.28
Activations Density 2.407%