INDEX
Explanations
the beginning of a text or article
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.32
1.3%
394
+0.31
1.3%
674
+0.31
1.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.32
0.03
394
+0.31
0.05
609
+0.31
0.03
Negative Logits
embodi
-1.05
depic
-0.90
compen
-0.88
maneu
-0.87
emphat
-0.87
unve
-0.86
berea
-0.85
uniqu
-0.85
fischer
-0.84
Kün
-0.84
POSITIVE LOGITS
<bos>
0.68
IsContent
0.66
DoubleQuotes
0.63
WriteTagHelper
0.61
isContained
0.61
ConstraintMaker
0.59
ContentAsync
0.59
ModelExpression
0.58
IsMutable
0.58
ویکیپدی
0.57
Activations Density 0.444%