INDEX
Explanations
phrases related to legal definitions and consequences of actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.32
1.8%
221
+0.12
0.7%
479
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
74
+0.32
0.05
82
+0.12
0.05
26
+0.10
0.03
Negative Logits
arynge
-1.91
lication
-1.77
because
-1.70
го
-1.68
reement
-1.66
suffix
-1.66
falls
-1.64
procedure
-1.56
arlier
-1.51
aré
-1.50
POSITIVE LOGITS
ħ
2.10
remotely
1.76
ĭ
1.73
ļ
1.61
ĵ
1.56
·¸
1.53
¾
1.53
ĸ´
1.50
Evan
1.48
access
1.48
Activations Density 2.512%