INDEX
Explanations
discussions or mentions of societal issues and controversies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.12
0.3%
1984
+0.09
0.2%
674
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.12
0.05
468
+0.09
0.04
1992
+0.09
0.02
Negative Logits
shenan
-1.13
apprehen
-1.02
unspeak
-1.02
sophistic
-0.92
reluct
-0.91
depic
-0.90
impra
-0.88
cuck
-0.86
hentai
-0.86
pooh
-0.85
POSITIVE LOGITS
Singapur
0.78
ideolog
0.74
democra
0.71
balon
0.69
impon
0.69
Composição
0.68
meras
0.68
asyarakat
0.66
Conteúdo
0.66
solidar
0.65
Activations Density 0.301%