INDEX
Explanations
political, ideological, and controversial terms or names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.2%
2019
+0.07
0.4%
382
+0.06
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1156
+0.19
0.07
1262
+0.07
0.07
1624
+0.06
0.07
Negative Logits
<bos>
-2.15
ⓧ
-1.34
/**
-1.22
-1.16
<?
-1.11
<?
-1.06
/*
-1.04
/***
-0.97
///**
-0.95
<!--
-0.81
POSITIVE LOGITS
affor
1.23
véhic
1.19
maneu
1.14
santiago
1.11
toledo
1.11
Minang
1.10
lidl
1.10
stockholm
1.09
Juf
1.09
magis
1.08
Activations Density 0.373%