INDEX
Explanations
proper nouns and names, possibly related to legal cases or parties involved
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.24
1.4%
410
+0.16
0.9%
271
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
369
+0.24
0.07
410
+0.16
0.10
271
+0.13
0.02
Negative Logits
oretic
-1.88
onical
-1.68
reland
-1.67
ixel
-1.66
etary
-1.63
etimes
-1.63
generation
-1.62
xico
-1.58
ESULT
-1.57
sible
-1.56
POSITIVE LOGITS
ĺ
3.65
ī
3.53
Ń
3.51
ij
3.43
ı
3.38
§
3.37
½
3.35
·
3.32
ľ
3.27
Ĭ
3.26
Activations Density 0.395%