INDEX
Explanations
phrases describing past habits or experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.9%
517
+0.10
0.4%
1363
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
646
+0.23
0.04
517
+0.10
0.04
499
+0.09
0.04
Negative Logits
<bos>
-2.32
displayquote
-0.74
@
-0.74
quad
-0.73
public
-0.70
///
-0.69
///**
-0.68
ള്ള
-0.68
aligned
-0.66
//
-0.66
POSITIVE LOGITS
maneu
2.16
affor
2.06
accla
2.02
increa
2.01
reluct
1.95
depic
1.93
unlaw
1.91
disagre
1.90
inev
1.88
Juf
1.87
Activations Density 0.455%