INDEX
Explanations
references to demonic entities or evil figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1328
+0.14
0.5%
501
+0.12
0.4%
303
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1328
+0.14
0.02
501
+0.12
0.02
966
+0.11
0.02
Negative Logits
للاسماء
-0.61
InvalidProtocol
-0.60
censiti
-0.59
Datuak
-0.56
esternos
-0.53
PeEnEo
-0.52
Obrador
-0.51
citenamefont
-0.50
mistak
-0.50
Referencie
-0.49
POSITIVE LOGITS
Devil
0.98
Devil
0.96
devil
0.94
exé
0.93
devils
0.91
devil
0.88
Demon
0.81
demon
0.80
«<
0.80
»>
0.79
Activations Density 0.108%