INDEX
Explanations
statements related to laws, legal recommendations, and government appointments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.19
0.6%
1499
+0.09
0.3%
1297
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
872
+0.19
0.07
1380
+0.09
0.03
488
+0.08
0.06
Negative Logits
emphat
-1.19
depic
-1.09
hairc
-1.07
hentai
-1.07
desir
-1.07
guarante
-1.06
pixar
-1.06
nutella
-1.05
intersper
-1.03
hasbro
-1.03
POSITIVE LOGITS
laws
0.85
rules
0.78
law
0.73
regulations
0.73
rules
0.69
rule
0.66
legislation
0.66
enforcement
0.65
laws
0.62
Rules
0.61
Activations Density 0.671%