INDEX
Explanations
phrases related to historical and political events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.1%
1842
+0.20
0.9%
1108
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.24
0.11
1842
+0.20
0.07
50
+0.10
0.13
Negative Logits
<bos>
-1.34
PMailer
-0.62
quias
-0.61
Chwiliwch
-0.60
RegressionTest
-0.59
beginnetje
-0.58
mphony
-0.57
esboço
-0.57
TagMode
-0.56
AllMovie
-0.56
POSITIVE LOGITS
unspeak
1.40
shenan
1.38
reluct
1.36
apprehen
1.35
disagre
1.34
pamph
1.31
impra
1.31
sophistic
1.29
ineffec
1.25
uninten
1.25
Activations Density 2.554%