INDEX
Explanations
references to the term "Black," particularly in discussions regarding identity, community, or social issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
1.4%
680
+0.13
0.8%
521
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1505
+0.23
0.04
521
+0.13
0.04
1127
+0.12
0.04
Negative Logits
<bos>
-2.99
ValueGenerated
-0.76
rungsseite
-0.75
<!--
-0.69
},[])
-0.67
dicionado
-0.67
IonicModule
-0.66
</>
-0.66
RepeatedField
-0.66
})();
-0.65
POSITIVE LOGITS
Juf
1.71
véhic
1.45
volunte
1.43
maneu
1.43
McLaugh
1.43
affor
1.42
Confu
1.42
Intere
1.41
Gorb
1.40
Augu
1.40
Activations Density 0.064%