INDEX
Explanations
references to shifts or changes in a political context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
61
+0.11
0.4%
1870
+0.11
0.4%
188
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
188
+0.11
0.03
61
+0.11
0.02
1506
+0.11
0.02
Negative Logits
péné
-0.66
Madura
-0.60
vogli
-0.58
avrebbero
-0.50
avete
-0.49
Puro
-0.47
avesse
-0.47
guadag
-0.46
Manus
-0.46
doveva
-0.46
POSITIVE LOGITS
shift
1.26
shift
1.25
shifts
1.17
Shift
1.17
Shift
1.12
shifting
1.08
shifted
1.07
shifting
1.07
shifts
1.03
Shifts
1.00
Activations Density 0.076%