INDEX
Explanations
the conditional word "if"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.14
0.8%
268
+0.12
0.6%
50
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
236
+0.14
0.11
21
+0.12
0.07
188
+0.10
0.07
Negative Logits
chester
-1.72
credibility
-1.57
idian
-1.56
deterg
-1.50
antigens
-1.49
>&
-1.45
vote
-1.44
¿½
-1.44
erred
-1.41
atrix
-1.40
POSITIVE LOGITS
simpl
1.52
Photograph
1.51
glass
1.50
leine
1.49
zes
1.47
shoot
1.47
own
1.44
example
1.44
objection
1.42
enÃŃ
1.41
Activations Density 3.696%