INDEX
Explanations
key phrases that challenge or define truths in a political context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.6%
381
+0.10
0.3%
198
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1419
+0.18
0.04
1426
+0.10
0.04
1757
+0.10
0.04
Negative Logits
<bos>
-1.71
Hahahahaha
-1.03
Hahah
-0.99
Cringe
-0.96
Fuckin
-0.93
milf
-0.90
/***
-0.90
Derp
-0.85
Ehh
-0.84
asf
-0.83
POSITIVE LOGITS
bandung
1.19
Jambi
1.17
Karang
1.15
Minang
1.10
jawa
1.08
jaya
1.03
signora
1.01
papà
0.97
Palembang
0.95
öyle
0.95
Activations Density 0.484%