INDEX
Explanations
phrases related to explaining or indicating something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.08
0.2%
674
+0.08
0.2%
1355
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1355
+0.08
0.04
1889
+0.08
0.04
1021
+0.07
0.04
Negative Logits
reluct
-1.96
unden
-1.96
encomp
-1.88
increa
-1.88
disagre
-1.88
fta
-1.88
guarante
-1.87
wherea
-1.85
secon
-1.84
inev
-1.84
POSITIVE LOGITS
<bos>
1.23
JpaRepository
0.69
normal
0.67
ResponseWriter
0.66
HtmlAttribute
0.65
complexContent
0.64
XMLSchema
0.64
createState
0.63
harmless
0.63
正常
0.62
Activations Density 0.433%