INDEX
Explanations
instances of the word "replace" or related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.0%
1272
+0.10
0.5%
32
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
9
+0.19
0.03
61
+0.10
0.03
1272
+0.09
0.03
Negative Logits
<bos>
-3.21
public
-0.67
/***
-0.65
ⓧ
-0.61
///**
-0.59
ostringstream
-0.58
circ
-0.57
立
-0.57
earn
-0.57
/**
-0.57
POSITIVE LOGITS
stockholm
1.56
lele
1.56
aen
1.52
fta
1.51
ftu
1.49
bandung
1.48
Juf
1.48
thut
1.44
hcm
1.44
wien
1.43
Activations Density 0.110%