INDEX
Explanations
discussions related to laws and regulations, particularly those involving controversial or sensitive topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1376
+0.14
0.5%
479
+0.12
0.4%
1961
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1376
+0.14
0.04
1105
+0.12
0.04
437
+0.12
0.03
Negative Logits
logis
-0.68
dante
-0.63
akade
-0.63
radikal
-0.61
kritis
-0.60
aen
-0.60
minimalis
-0.59
bera
-0.58
republi
-0.58
naer
-0.57
POSITIVE LOGITS
laws
1.34
Laws
1.21
law
1.20
Law
1.06
laws
1.05
Laws
1.05
Law
1.04
law
1.01
LAW
0.99
LAWS
0.99
Activations Density 0.075%