INDEX
Explanations
phrases related to legal documents and government actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1499
+0.10
0.3%
1120
+0.10
0.3%
396
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.10
0.04
1030
+0.10
0.02
300
+0.08
0.03
Negative Logits
hairc
-1.02
intersper
-0.80
hentai
-0.78
indescri
-0.77
milf
-0.77
funko
-0.76
embodi
-0.76
amigurumi
-0.73
depic
-0.73
emphat
-0.73
POSITIVE LOGITS
January
0.54
effective
0.51
Vidite
0.49
January
0.49
February
0.48
effective
0.48
accogli
0.48
Effective
0.48
kela
0.47
生效
0.47
Activations Density 0.241%