INDEX
Explanations
the occurrence of thoughts or contemplative states
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
13
+0.12
0.7%
115
+0.12
0.7%
348
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
15
+0.12
0.03
471
+0.12
0.02
115
+0.11
0.02
Negative Logits
ĭ
-2.22
¨
-2.20
ľĵ
-2.18
ĥ
-2.14
ľ
-1.94
·¸
-1.94
·
-1.92
Ģ
-1.92
ĻĤ
-1.90
ķ
-1.83
POSITIVE LOGITS
aloud
1.98
about
1.88
fully
1.84
lessly
1.84
about
1.70
fulness
1.66
goodbye
1.63
About
1.60
ful
1.56
prov
1.49
Activations Density 0.163%