INDEX
Explanations
phrases related to advocacy and pushing for change
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
397
+0.15
0.5%
1350
+0.14
0.5%
1993
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
397
+0.15
0.04
1350
+0.14
0.04
1993
+0.12
0.04
Negative Logits
Irak
-0.49
Coupé
-0.47
autorytatywna
-0.47
Bao
-0.47
fficio
-0.46
Quinta
-0.45
Gallardo
-0.44
Hela
-0.44
Scénario
-0.43
Spart
-0.43
POSITIVE LOGITS
push
1.35
pushes
1.27
pushing
1.25
pushed
1.25
Push
1.24
push
1.23
Pushing
1.22
Pushing
1.21
Push
1.14
pushed
1.13
Activations Density 0.075%