INDEX
Explanations
phrases related to significant political events and themes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.16
0.5%
1314
+0.11
0.3%
1013
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.16
0.06
1870
+0.11
0.03
62
+0.10
0.04
Negative Logits
affor
-1.03
impra
-0.98
scrat
-0.95
desir
-0.93
increa
-0.93
purcha
-0.92
suscep
-0.92
snoopy
-0.91
strick
-0.90
peppa
-0.90
POSITIVE LOGITS
politics
0.58
antigu
0.56
})*/
0.56
culture
0.56
discourse
0.55
affairs
0.55
history
0.54
society
0.54
colazione
0.51
});*/
0.50
Activations Density 0.445%