INDEX
Explanations
phrases related to bipartisan efforts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.14
0.5%
478
+0.11
0.4%
1325
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.14
0.05
678
+0.11
0.06
538
+0.11
0.05
Negative Logits
confé
-0.68
kompati
-0.60
Konkur
-0.59
prédé
-0.58
abstrait
-0.55
Gouver
-0.55
gius
-0.54
الحره
-0.53
inconnu
-0.52
isolé
-0.52
POSITIVE LOGITS
obstinate
0.63
xPos
0.63
blockSize
0.62
hornblende
0.62
posX
0.61
barbarous
0.61
pymysql
0.61
insensible
0.60
indignant
0.59
blackish
0.59
Activations Density 0.352%