INDEX
Explanations
references to attacks or threats of violence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1520
+0.18
0.7%
1296
+0.12
0.5%
1085
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1520
+0.18
0.06
1778
+0.12
0.04
1296
+0.12
0.04
Negative Logits
للاسماء
-0.52
JspWriter
-0.48
setPointSize
-0.46
prome
-0.46
GTCX
-0.46
UniformLocation
-0.44
Décès
-0.43
FunctionFlags
-0.42
Grund
-0.42
beig
-0.42
POSITIVE LOGITS
attack
1.25
Attack
1.16
attack
1.14
attacks
1.10
Attacks
1.08
Attack
1.04
Attacks
1.00
attacks
0.99
attacked
0.98
attacking
0.97
Activations Density 0.107%