INDEX
Explanations
phrases related to copyright infringement and reproduction permissions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
90
+0.13
0.6%
50
+0.12
0.5%
78
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
78
+0.13
0.04
90
+0.12
0.02
872
+0.11
0.03
Negative Logits
<bos>
-2.47
/***
-0.73
/**
-0.70
ⓧ
-0.66
-0.63
లాలు
-0.60
<?
-0.59
<?
-0.59
#![
-0.58
/*!
-0.58
POSITIVE LOGITS
bandung
1.19
jaya
1.12
lele
1.12
saar
1.10
haup
1.09
hcm
1.04
reproductions
1.04
allah
1.02
karna
1.02
kasa
1.01
Activations Density 0.196%