INDEX
Explanations
information and opinions related to politics and societal issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.12
0.3%
1537
+0.08
0.2%
752
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1076
+0.12
0.04
98
+0.08
0.04
1340
+0.08
0.04
Negative Logits
thut
-0.83
fta
-0.76
ftu
-0.75
»>
-0.75
quitted
-0.75
purcha
-0.75
poff
-0.73
guarante
-0.72
feen
-0.72
tranf
-0.71
POSITIVE LOGITS
years
0.76
decades
0.74
centuries
0.63
awhile
0.61
YEARS
0.60
months
0.58
Years
0.57
years
0.57
decade
0.55
Depuis
0.55
Activations Density 0.159%