INDEX
Explanations
comments or statements in political contexts, including opinions and reactions from various individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.15
0.4%
604
+0.11
0.3%
605
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
924
+0.15
0.04
605
+0.11
0.02
2019
+0.11
0.05
Negative Logits
impra
-1.47
shenan
-1.46
reluct
-1.41
suscep
-1.40
increa
-1.35
affor
-1.33
intermitt
-1.32
stickied
-1.29
hentai
-1.29
cushi
-1.26
POSITIVE LOGITS
ParallelGroup
0.65
If
0.65
This
0.64
It
0.63
TRAILING
0.63
Inject
0.62
Sprintf
0.60
DoubleQuotes
0.60
Whether
0.59
Datuak
0.59
Activations Density 0.188%