INDEX
Explanations
instructions related to physical self-defense techniques or tactics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.19
0.5%
1415
+0.11
0.3%
381
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
62
+0.19
0.04
1229
+0.11
0.03
1265
+0.11
0.03
Negative Logits
fta
-2.65
inev
-2.61
emphat
-2.60
secon
-2.59
accla
-2.58
depic
-2.57
embra
-2.55
squa
-2.54
dises
-2.53
ftu
-2.50
POSITIVE LOGITS
try
1.17
don
1.11
you
1.08
please
1.08
consider
1.07
remember
1.04
let
1.02
make
1.02
choose
1.00
try
1.00
Activations Density 0.304%