INDEX
Explanations
warnings, disclaimers, and legal information in text documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
1.0%
281
+0.12
0.5%
554
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1056
+0.22
0.03
1376
+0.12
0.02
1334
+0.12
0.02
Negative Logits
<bos>
-2.26
-0.77
quitted
-0.71
<?
-0.69
disbur
-0.64
/*!
-0.64
ⓧ
-0.64
alcançar
-0.61
superintend
-0.60
<?
-0.57
POSITIVE LOGITS
igarette
0.67
Expt
0.66
venuto
0.65
maroc
0.65
tolu
0.65
pensato
0.65
italia
0.63
cæ
0.63
ados
0.62
noinspection
0.61
Activations Density 0.285%