INDEX
Explanations
phrases related to efforts and commitments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.09
0.3%
1499
+0.08
0.2%
257
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.09
0.04
105
+0.08
0.04
1038
+0.07
0.04
Negative Logits
reluct
-1.52
suscep
-1.50
wherea
-1.47
unden
-1.43
maneu
-1.42
impra
-1.41
depic
-1.39
disagre
-1.39
increa
-1.39
volunte
-1.38
POSITIVE LOGITS
improve
0.90
ensure
0.87
prevent
0.84
ensuring
0.79
improving
0.79
protect
0.79
reduce
0.78
strengthen
0.76
enhance
0.74
promote
0.71
Activations Density 0.407%