INDEX
Explanations
references to ownership or personal possession
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.16
1.0%
125
+0.11
0.7%
315
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
329
+0.16
0.01
315
+0.11
0.01
125
+0.11
0.01
Negative Logits
ening
-1.74
orphic
-1.65
while
-1.60
ogether
-1.57
ocent
-1.48
eners
-1.44
raits
-1.43
than
-1.41
ener
-1.41
()</
-1.37
POSITIVE LOGITS
©
2.14
ĸ´
1.80
¾
1.80
license
1.74
¬
1.74
¸
1.74
·
1.74
¶
1.72
ĥ
1.71
ĵ
1.69
Activations Density 0.040%