INDEX
Explanations
references to "subject" and its variations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
1.3%
1677
+0.13
0.8%
528
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1677
+0.21
0.03
260
+0.13
0.03
1637
+0.11
0.03
Negative Logits
<bos>
-3.11
丹
-0.64
public
-0.64
int
-0.63
//@
-0.61
/*++
-0.60
realize
-0.59
Italijani
-0.59
win
-0.59
estacks
-0.59
POSITIVE LOGITS
increa
1.63
stockholm
1.61
Juf
1.58
affor
1.57
bandung
1.56
maneu
1.55
accla
1.54
unlaw
1.53
jaya
1.53
reluct
1.51
Activations Density 0.027%