INDEX
Explanations
phrases related to removal or elimination
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.9%
1806
+0.12
0.6%
597
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1806
+0.19
0.03
1142
+0.12
0.02
204
+0.11
0.02
Negative Logits
<bos>
-2.96
/***
-0.82
intersper
-0.73
endow
-0.67
//*/
-0.66
-0.66
strove
-0.65
amass
-0.65
rehabilitate
-0.64
overcrow
-0.62
POSITIVE LOGITS
kram
1.07
lele
1.06
saar
1.05
meis
1.05
seksi
1.01
maksi
1.01
keramik
1.01
plak
1.01
nomine
1.00
ananas
0.99
Activations Density 0.235%