INDEX
Explanations
dates and news article metadata
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
200
+0.10
0.3%
1053
+0.08
0.2%
585
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.10
0.02
1343
+0.08
0.04
724
+0.07
0.03
Negative Logits
vų
-0.70
HideFlags
-0.66
Shetterly
-0.64
çadas
-0.64
engeance
-0.64
éndole
-0.63
стаття
-0.62
Estou
-0.60
čiu
-0.60
Jornal
-0.58
POSITIVE LOGITS
tos
2.29
affor
1.83
reluct
1.79
disagre
1.79
fuf
1.75
guarante
1.72
fup
1.72
maneu
1.72
ftu
1.68
increa
1.67
Activations Density 0.314%