INDEX
Explanations
phrases related to specific names or titles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
505
+0.09
0.3%
1731
+0.09
0.3%
1385
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
505
+0.09
0.01
198
+0.09
0.03
1891
+0.08
0.01
Negative Logits
İstinadlar
-0.60
Manbalar
-0.54
weebly
-0.52
livejournal
-0.47
éclairage
-0.43
sidemargin
-0.43
Izvori
-0.43
Prijs
-0.43
NamedQuery
-0.43
Gemeinsame
-0.43
POSITIVE LOGITS
guir
0.68
applau
0.67
sappi
0.66
jajaja
0.65
trás
0.65
apparti
0.63
Ottobre
0.62
ificance
0.62
érêt
0.62
sés
0.62
Activations Density 0.277%