INDEX
Explanations
specific names and terms related to politics, social issues, and health care
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
111
+0.20
1.1%
254
+0.14
0.8%
320
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
258
+0.20
0.14
345
+0.14
0.09
71
+0.12
0.13
Negative Logits
yours
-1.72
>(
-1.67
lickr
-1.58
>()
-1.58
theirs
-1.57
liking
-1.56
.).
-1.56
ours
-1.47
"));
-1.46
>",
-1.42
POSITIVE LOGITS
ium
1.58
sson
1.47
administration
1.45
Republic
1.43
weather
1.34
stown
1.34
ÂĴ
1.33
amounts
1.31
Enterprise
1.30
sein
1.29
Activations Density 1.919%