INDEX
Explanations
proper nouns related to political candidates and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
453
+0.16
0.5%
184
+0.16
0.5%
764
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
453
+0.16
0.02
184
+0.16
0.01
137
+0.15
0.01
Negative Logits
ьаж
-0.89
Roskov
-0.59
ыгана
-0.57
Exacts
-0.57
>--}}
-0.53
UnsafeEnabled
-0.52
blurRadius
-0.51
calendriers
-0.51
Paglinawan
-0.51
vairāk
-0.49
POSITIVE LOGITS
reluct
1.40
affor
1.39
indestru
1.32
impra
1.24
increa
1.21
depic
1.20
shenan
1.19
philanth
1.18
inconce
1.17
unve
1.14
Activations Density 0.024%