INDEX
Explanations
descriptions of personal experiences and stories related to various topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.4%
1806
+0.11
0.6%
1068
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1507
+0.24
0.07
1806
+0.11
0.07
1892
+0.11
0.07
Negative Logits
<bos>
-3.42
/***
-1.12
ⓧ
-1.07
/**
-0.92
-0.91
/*
-0.88
<?
-0.84
<?
-0.83
//};
-0.75
///**
-0.70
POSITIVE LOGITS
maneu
1.06
épu
0.99
vété
0.96
maroc
0.94
fameux
0.93
héro
0.91
curieux
0.87
eiffel
0.86
milano
0.86
And
0.86
Activations Density 0.348%