INDEX
Explanations
references to scientific publications and academic journals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.9%
1343
+0.09
0.3%
86
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
342
+0.23
0.02
86
+0.09
0.02
460
+0.07
0.02
Negative Logits
<bos>
-2.33
ⓧ
-1.21
intersper
-0.84
<?
-0.80
/***
-0.75
/**
-0.75
disbur
-0.74
<?
-0.70
łgorzata
-0.66
defray
-0.62
POSITIVE LOGITS
siyah
0.88
pylab
0.84
dison
0.79
mavi
0.73
usak
0.71
onaldo
0.71
baya
0.66
yanto
0.66
uwu
0.65
lijah
0.65
Activations Density 0.040%