INDEX
Explanations
legal and bureaucratic language related to specific cases or issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1565
+0.17
0.6%
645
+0.13
0.5%
1839
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1565
+0.17
0.04
636
+0.13
0.03
645
+0.13
0.03
Negative Logits
roh
-0.63
prouve
-0.59
kula
-0.58
kef
-0.58
kuh
-0.57
kona
-0.57
uhr
-0.56
bali
-0.56
kud
-0.55
augus
-0.55
POSITIVE LOGITS
matter
1.28
matter
1.28
Matter
1.20
Matter
1.20
matters
1.15
matters
1.08
MATTER
1.08
Matters
1.06
mattered
0.97
Matters
0.93
Activations Density 0.064%