INDEX
Explanations
contractions and possessive forms of pronouns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.21
0.9%
2019
+0.15
0.7%
1699
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
478
+0.21
0.09
761
+0.15
0.07
817
+0.12
0.07
Negative Logits
<bos>
-1.40
-1.10
<?
-1.08
ⓧ
-1.05
Messieurs
-1.00
/**
-1.00
hentai
-0.98
<?
-0.95
racon
-0.92
/*!
-0.89
POSITIVE LOGITS
()")
0.72
’
0.72
'
0.69
mathrm
0.65
Einzelnachweise
0.64
*/;
0.63
$'
0.62
s
0.61
=",
0.60
Collegamenti
0.59
Activations Density 0.285%