INDEX
Explanations
references to security measures and return policies in online stores
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1548
+0.10
0.3%
1491
+0.09
0.3%
228
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
690
+0.10
0.05
1343
+0.09
0.04
1120
+0.07
0.04
Negative Logits
Minang
-1.09
disagre
-1.08
withal
-1.01
kani
-0.97
quitted
-0.94
vainly
-0.94
bandung
-0.94
lola
-0.93
tolerably
-0.88
apprehen
-0.87
POSITIVE LOGITS
Sung
2.06
sung
2.01
Sung
2.00
sung
1.87
sunglasses
1.15
Sunglasses
1.05
SUNG
0.94
alnız
0.82
Platinum
0.76
Platinum
0.72
Activations Density 0.333%