INDEX
Explanations
references to legal or justice-related systems
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1705
+0.18
0.7%
1870
+0.13
0.5%
169
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1705
+0.18
0.06
188
+0.13
0.04
1438
+0.12
0.03
Negative Logits
inev
-1.50
intersper
-1.45
fta
-1.45
increa
-1.43
depic
-1.42
squa
-1.42
secon
-1.39
aen
-1.39
?...
-1.38
disagre
-1.36
POSITIVE LOGITS
system
1.44
system
1.37
System
1.33
System
1.29
systems
1.26
SYSTEM
1.22
SYSTEM
1.21
ystem
1.18
Systems
1.17
systems
1.17
Activations Density 0.062%