INDEX
Explanations
references to political and government related topics, such as regulations, measures, and policy decisions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
776
+0.12
0.4%
795
+0.11
0.4%
1233
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
776
+0.12
0.03
991
+0.11
0.02
795
+0.11
0.02
Negative Logits
rcParams
-0.46
]=="
-0.46
wodurch
-0.45
ouvriers
-0.44
gezien
-0.42
thicket
-0.42
RTEE
-0.42
насељу
-0.41
Grecian
-0.39
RTLR
-0.39
POSITIVE LOGITS
ixante
0.90
seventy
0.83
eighty
0.82
ninety
0.81
thirty
0.80
twenty
0.80
thirty
0.79
fifty
0.79
priva
0.78
sixty
0.78
Activations Density 0.058%