INDEX
Explanations
proper nouns and specific terms related to names and titles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.17
0.7%
1741
+0.13
0.5%
1978
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
678
+0.17
0.04
80
+0.13
0.03
1978
+0.12
0.04
Negative Logits
<bos>
-2.70
ⓧ
-1.14
-1.12
<?
-0.95
/**
-0.94
#
-0.77
initComponents
-0.77
/*
-0.71
intios
-0.67
contentLoaded
-0.67
POSITIVE LOGITS
aen
1.84
Juf
1.72
thut
1.72
dises
1.71
nece
1.67
emphat
1.65
inev
1.64
mef
1.63
meis
1.63
fta
1.63
Activations Density 0.175%