INDEX
Explanations
phrases related to legal and criminal actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1042
+0.09
0.2%
604
+0.08
0.2%
453
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1395
+0.09
0.04
1309
+0.08
0.03
1128
+0.08
0.06
Negative Logits
aen
-1.32
increa
-1.32
desir
-1.25
thut
-1.24
fluo
-1.22
effe
-1.21
inev
-1.20
guarante
-1.20
nutella
-1.18
impra
-1.17
POSITIVE LOGITS
disciplinary
0.74
CreateTagHelper
0.61
suspension
0.61
disciplinary
0.57
misconduct
0.56
punishment
0.54
suspended
0.54
impeachment
0.54
DeleteBehavior
0.53
للاسماء
0.52
Activations Density 0.734%