INDEX
Explanations
questions and exclamations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.6%
814
+0.09
0.3%
789
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1109
+0.18
0.03
851
+0.09
0.03
1295
+0.09
0.03
Negative Logits
<bos>
-2.31
<?
-0.64
/***
-0.61
Autoritní
-0.57
Enllaços
-0.57
AsUp
-0.57
modelBuilder
-0.57
########.
-0.54
djangoproject
-0.53
osoba
-0.52
POSITIVE LOGITS
maneu
1.01
signora
0.97
maroc
0.89
reluct
0.88
pylab
0.87
psycopg
0.87
disagre
0.86
disreg
0.81
impra
0.81
fortn
0.80
Activations Density 0.114%