INDEX
Explanations
references to important or leading figures in a context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.12
0.3%
1253
+0.10
0.3%
1108
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
554
+0.12
0.05
1835
+0.10
0.04
1387
+0.08
0.04
Negative Logits
vété
-0.82
géant
-0.65
confé
-0.62
fantaisie
-0.58
gardien
-0.56
titulaire
-0.54
spécialisée
-0.53
voordelen
-0.53
fondateur
-0.52
aéri
-0.51
POSITIVE LOGITS
ideolog
0.83
meis
0.79
territo
0.75
maig
0.74
doman
0.73
auctor
0.73
theolog
0.71
magis
0.70
erit
0.70
paff
0.70
Activations Density 0.192%