INDEX
Explanations
references to community engagement, political activities, and social issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.17
0.5%
1343
+0.15
0.5%
394
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
394
+0.17
0.08
1261
+0.15
0.02
609
+0.14
0.06
Negative Logits
cytoplas
-0.65
throwaway
-0.60
Quantification
-0.55
logarith
-0.53
constamment
-0.50
frow
-0.50
Opportun
-0.50
razer
-0.50
дописавши
-0.49
aussitôt
-0.49
POSITIVE LOGITS
[]"
0.86
").
0.71
”)
0.71
”).
0.70
”
0.68
")
0.67
”,
0.65
"),
0.65
”),
0.64
”:
0.64
Activations Density 1.036%