INDEX
Explanations
terms associated with the concept of devaluation or undermining worth
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.1%
1145
+0.16
0.9%
1404
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1145
+0.19
0.04
1404
+0.16
0.04
900
+0.13
0.03
Negative Logits
<bos>
-2.51
springfox
-0.81
<?
-0.68
/*++
-0.61
disbur
-0.60
banish
-0.59
awsze
-0.58
signore
-0.55
proprietario
-0.55
tská
-0.54
POSITIVE LOGITS
Rine
1.09
soulign
1.07
Dewi
1.04
De
1.04
compréhen
1.04
Deh
1.02
Deg
1.02
De
1.00
véhic
0.99
Dede
0.99
Activations Density 0.132%