INDEX
Explanations
questions and uncertainty in phrases addressing moral and ethical dilemmas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.11
0.3%
674
+0.10
0.3%
190
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.11
0.07
862
+0.10
0.04
207
+0.08
0.06
Negative Logits
unlaw
-0.90
updateTime
-0.86
Keny
-0.84
Juf
-0.83
Khart
-0.83
pageNo
-0.81
impractica
-0.81
Punj
-0.79
Hez
-0.78
pamph
-0.76
POSITIVE LOGITS
<bos>
0.91
дописавши
0.53
Paglinawan
0.53
原始内容存档于
0.52
InBytes
0.50
وما
0.50
外部連結
0.50
done
0.49
happen
0.49
::<
0.48
Activations Density 0.237%