INDEX
Explanations
names and titles of individuals or organizations, with a particular focus on notable figures and leaders
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.13
0.4%
605
+0.10
0.3%
1842
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.13
0.02
322
+0.10
0.01
545
+0.10
0.02
Negative Logits
CLUDING
-0.63
aisladas
-0.63
然后再
-0.62
CRITERIA
-0.59
IMPROVEMENT
-0.58
PROCESSES
-0.58
OBSERVATIONS
-0.56
mniej
-0.55
POLLUTION
-0.54
praca
-0.54
POSITIVE LOGITS
viciss
1.16
accla
1.16
Abbé
1.15
ausp
1.14
embra
1.12
Simult
1.07
Chapitre
1.06
Sén
1.06
»>
1.06
quoique
1.03
Activations Density 0.099%