INDEX
Explanations
phrases related to political or social commentary
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.22
0.8%
1967
+0.19
0.7%
50
+0.16
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.22
0.06
16
+0.19
0.06
1967
+0.16
0.05
Negative Logits
łaści
-0.65
<bos>
-0.63
OrEmpty
-0.57
wikk
-0.56
marginVertical
-0.55
exemplaires
-0.55
:^{-0.54
!("{-0.53
<?
-0.53
gehouden
-0.52
POSITIVE LOGITS
ohr
1.07
kram
1.05
gend
1.02
bahia
1.01
plak
0.99
marmor
0.98
alpes
0.97
lele
0.97
maer
0.96
uhr
0.94
Activations Density 0.336%