INDEX
Explanations
references to organizations, political events, and actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
344
+0.10
0.3%
1135
+0.09
0.3%
1870
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1071
+0.10
0.06
1662
+0.09
0.07
155
+0.09
0.05
Negative Logits
fordable
-0.63
intersper
-0.62
disreg
-0.60
downvotes
-0.59
lorenzo
-0.59
hmmmm
-0.59
javier
-0.58
impra
-0.57
encomp
-0.57
eyel
-0.57
POSITIVE LOGITS
akut
0.58
because
0.57
antik
0.55
because
0.55
optik
0.55
unless
0.54
altogether
0.52
DataSnapshot
0.52
Pozdrawiam
0.52
ModelForm
0.51
Activations Density 0.662%