INDEX
Explanations
percentages and statistics related to sociopolitical topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1473
+0.08
0.2%
1727
+0.07
0.2%
1198
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1002
+0.08
0.02
1436
+0.07
0.02
1238
+0.07
0.02
Negative Logits
unspeak
-0.85
gaily
-0.79
exasper
-0.77
disagre
-0.77
tolerably
-0.76
vainly
-0.76
shenan
-0.72
intersper
-0.72
nobly
-0.71
indescri
-0.70
POSITIVE LOGITS
utop
0.81
gymnas
0.75
attes
0.70
spion
0.70
ideolog
0.69
BIBSYS
0.69
soggior
0.68
bronz
0.68
Wikisource
0.68
solidar
0.67
Activations Density 0.041%