INDEX
Explanations
social controversies and conflicts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.12
0.3%
198
+0.11
0.3%
1842
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
640
+0.12
0.05
198
+0.11
0.05
155
+0.09
0.05
Negative Logits
increa
-0.85
michelin
-0.85
fortn
-0.83
reluct
-0.83
désol
-0.82
encomp
-0.82
shenan
-0.81
nephe
-0.79
Manufact
-0.77
secon
-0.77
POSITIVE LOGITS
dared
0.57
<?
0.57
slightest
0.56
кӀ
0.54
hatenablog
0.51
toPromise
0.50
=>'
0.48
oara
0.48
liothèque
0.48
álbum
0.47
Activations Density 0.516%