INDEX
Explanations
tweets related to political commentary and criticism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.17
0.5%
2034
+0.16
0.5%
1535
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
648
+0.17
0.02
308
+0.16
0.02
1150
+0.12
0.01
Negative Logits
Și
-1.02
Ciò
-0.99
Queste
-0.98
Aún
-0.97
Przyp
-0.95
Mentre
-0.94
Gdy
-0.94
Infatti
-0.93
Puoi
-0.92
Cześć
-0.92
POSITIVE LOGITS
↵↵
0.75
↵↵↵
0.74
<eos>
0.71
<bos>
0.69
Rollo
0.68
↵↵↵↵
0.66
ventus
0.63
↵↵↵↵↵
0.62
Schwal
0.62
Hano
0.61
Activations Density 0.038%