INDEX
    Explanations

    phrases related to societal issues and narratives about race and privilege

    Follows "Q:" or "of"

    New Auto-Interp
    Negative Logits
     feroit
    -0.90
     pouvoit
    -0.89
     auroit
    -0.87
     Chriftian
    -0.85
     étoient
    -0.84
     étoit
    -0.84
     oprot
    -0.82
     avoient
    -0.80
     enfans
    -0.79
    SourceChecksum
    -0.79
    POSITIVE LOGITS
     even
    0.79
    </thead>
    0.58
     sogar
    0.55
     etc
    0.54
     Even
    0.52
    何より
    0.51
     s
    0.50
     hatta
    0.50
     des
    0.49
     zelfs
    0.49
    Act Density 0.374%

    No Known Activations