INDEX
Explanations
phrases related to personal characteristics, actions, and beliefs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.34
1.2%
1577
+0.19
0.7%
198
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1842
+0.34
0.14
1577
+0.19
0.14
184
+0.15
0.03
Negative Logits
<bos>
-1.14
kasarigan
-0.64
ivelany
-0.58
Referencie
-0.58
tagHelperRunner
-0.57
Atsauces
-0.57
UnusedPrivate
-0.55
Enllaços
-0.55
Viited
-0.54
كومونز
-0.52
POSITIVE LOGITS
ananas
0.65
porc
0.64
pican
0.64
posX
0.62
richText
0.61
marte
0.60
zucca
0.60
maroc
0.59
moza
0.58
tenda
0.58
Activations Density 2.516%