INDEX
Explanations
the name "Dan" specifically
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1604
+0.14
0.8%
528
+0.14
0.8%
1573
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1097
+0.14
0.03
1573
+0.14
0.02
1604
+0.14
0.02
Negative Logits
<bos>
-1.81
lateinit
-0.63
ContentAlignment
-0.59
interface
-0.59
-0.58
>>
-0.58
Kích
-0.57
prime
-0.57
o
-0.57
|
-0.56
POSITIVE LOGITS
Juf
1.49
Dan
1.37
disagre
1.35
inev
1.34
shenan
1.33
increa
1.33
indestru
1.31
maneu
1.31
fortn
1.28
scrat
1.28
Activations Density 0.249%