INDEX
Explanations
occurrences of the name "Safer."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.21
1.2%
144
+0.14
0.8%
323
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
323
+0.21
0.02
369
+0.14
0.02
241
+0.13
0.02
Negative Logits
tons
-2.35
MENT
-1.98
nant
-1.74
wise
-1.68
sime
-1.67
MENTS
-1.67
t
-1.59
ANC
-1.53
mente
-1.53
ontally
-1.52
POSITIVE LOGITS
unders
1.67
ÅĨ
1.64
³
1.60
ÅĽci
1.55
zerba
1.52
pere
1.49
quel
1.47
iat
1.43
iale
1.42
uri
1.41
Activations Density 0.095%