INDEX
Explanations
phrases related to changing one's mind or decision-making processes based on new information or arguments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
74
+0.09
0.2%
332
+0.08
0.2%
344
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
332
+0.09
0.04
74
+0.08
0.03
284
+0.08
0.04
Negative Logits
increa
-1.95
inev
-1.91
affor
-1.90
guarante
-1.89
volunte
-1.88
disagre
-1.87
depic
-1.87
accla
-1.86
encomp
-1.86
snoopy
-1.85
POSITIVE LOGITS
kasarigan
0.83
stance
0.70
parsedMessage
0.66
regarding
0.65
decision
0.64
about
0.63
Nullable
0.63
Nonnull
0.63
forChild
0.62
awtextra
0.62
Activations Density 0.269%