INDEX
Explanations
references to religious and philosophical concepts alongside discussions of abortion and women's rights
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
1.1%
394
+0.18
0.8%
468
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1870
+0.26
0.09
394
+0.18
0.14
1856
+0.09
0.11
Negative Logits
<bos>
-1.88
ⓧ
-0.65
<?
-0.57
<blockquote>
-0.56
slf
-0.56
displayquote
-0.56
.
-0.56
engage
-0.55
enter
-0.55
earn
-0.54
POSITIVE LOGITS
bandung
1.54
Juf
1.44
sovere
1.39
Minang
1.39
lele
1.35
unwarran
1.31
Banjar
1.30
quoc
1.29
Intere
1.25
surabaya
1.24
Activations Density 2.723%