INDEX
Explanations
phrases indicating importance or significance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.7%
1351
+0.08
0.3%
1473
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.19
0.05
1657
+0.08
0.05
1869
+0.07
0.04
Negative Logits
<bos>
-2.36
<?
-0.79
intersper
-0.68
inaugurate
-0.66
/***
-0.65
assiste
-0.63
<!--
-0.62
ⓧ
-0.61
Дереккөздер
-0.61
endow
-0.61
POSITIVE LOGITS
Minang
0.80
ananas
0.79
bandung
0.76
télévis
0.75
chré
0.74
camry
0.74
useRouter
0.72
brune
0.71
jawa
0.70
COMPOUNDS
0.67
Activations Density 0.347%