INDEX
Explanations
terms related to societal issues, including minority groups, inequality, and social justice movements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.15
0.4%
1842
+0.13
0.4%
856
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.15
0.06
1510
+0.13
0.04
1202
+0.09
0.03
Negative Logits
<bos>
-0.86
RTSC
-0.66
FormTagHelper
-0.63
LabelTagHelper
-0.61
styleUrls
-0.61
Palmar
-0.61
setDo
-0.60
CodedInputStream
-0.60
Manbalar
-0.57
Wiktionnaire
-0.57
POSITIVE LOGITS
increa
1.40
affor
1.37
guarante
1.37
encomp
1.34
shenan
1.34
apprehen
1.32
strick
1.30
intersper
1.30
gaily
1.30
scrat
1.29
Activations Density 0.353%