INDEX
Explanations
expressions of personal enthusiasm or preference
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
0.9%
605
+0.10
0.4%
658
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1470
+0.26
0.03
605
+0.10
0.01
1773
+0.08
0.03
Negative Logits
<bos>
-1.82
protected
-0.68
/**
-0.67
var
-0.64
became
-0.64
appeared
-0.63
become
-0.63
enumerate
-0.63
public
-0.62
becomes
-0.62
POSITIVE LOGITS
Minang
1.38
increa
1.36
reluct
1.35
impra
1.31
cytoplas
1.30
swarovski
1.28
affor
1.27
quoique
1.27
disreg
1.25
maneu
1.25
Activations Density 0.291%