INDEX
Explanations
phrases related to political leaders and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.14
0.4%
2019
+0.12
0.4%
872
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.14
0.04
1961
+0.12
0.03
1261
+0.11
0.01
Negative Logits
artig
-0.69
ausp
-0.67
ordina
-0.67
viron
-0.65
Église
-0.65
siff
-0.64
cosi
-0.64
prostitu
-0.64
parati
-0.63
hecta
-0.63
POSITIVE LOGITS
nothin
0.70
unspeak
0.69
indescri
0.66
shenan
0.63
there
0.61
ineffec
0.61
swee
0.61
laug
0.59
horrend
0.59
prolly
0.58
Activations Density 0.257%