INDEX
Explanations
references to online discussion platforms and community interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.14
0.9%
184
+0.13
0.8%
96
+0.09
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.14
0.01
264
+0.13
0.01
96
+0.09
0.01
Negative Logits
normal
-1.58
sed
-1.57
]>
-1.52
_________
-1.52
outgoing
-1.52
yours
-1.51
heart
-1.51
saline
-1.43
transgender
-1.43
dece
-1.43
POSITIVE LOGITS
erne
2.29
helf
2.21
ière
2.20
erate
2.04
arium
1.91
ware
1.85
garten
1.84
ĻĤ
1.82
pieces
1.81
aji
1.77
Activations Density 0.006%