INDEX
Explanations
phrases related to societal issues and narratives about race and privilege
Follows "Q:" or "of"
identification of
New Auto-Interp
Negative Logits
feroit
-0.90
pouvoit
-0.89
auroit
-0.87
Chriftian
-0.85
étoient
-0.84
étoit
-0.84
oprot
-0.82
avoient
-0.80
enfans
-0.79
SourceChecksum
-0.79
POSITIVE LOGITS
even
0.79
</thead>
0.58
sogar
0.55
etc
0.54
Even
0.52
何より
0.51
s
0.50
hatta
0.50
des
0.49
zelfs
0.49
Activations Density 0.374%