INDEX
Explanations
phrases related to uncertainty and conditions that may or may not occur
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
971
+0.11
0.3%
303
+0.10
0.3%
1053
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
161
+0.11
0.04
971
+0.10
0.04
303
+0.10
0.04
Negative Logits
emphat
-0.96
desir
-0.94
inconce
-0.93
disagre
-0.91
effe
-0.89
increa
-0.87
fuf
-0.87
unwarran
-0.86
suspic
-0.86
perfet
-0.85
POSITIVE LOGITS
will
0.71
will
0.70
Will
0.61
sẽ
0.58
Will
0.57
gawas
0.57
Dichloroethane
0.52
WILL
0.51
Dichlorobenzene
0.50
queline
0.50
Activations Density 0.290%