INDEX
Explanations
phrases related to technical or instructional content, possibly explaining proper techniques or procedures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.36
1.3%
1343
+0.21
0.8%
227
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.36
0.03
137
+0.21
0.04
1343
+0.17
0.03
Negative Logits
Sarm
-0.63
Pyrene
-0.63
Middles
-0.62
philanth
-0.60
Heeren
-0.60
Abbé
-0.56
Hano
-0.56
Philadel
-0.55
emigrants
-0.55
rasc
-0.55
POSITIVE LOGITS
archiviato
0.53
WriteBarrier
0.50
∖
0.48
Wikimédia
0.46
omock
0.46
AVEC
0.46
AddTagHelper
0.45
custos
0.44
مُعرِّف
0.44
Synonymes
0.44
Activations Density 0.155%