INDEX
    Explanations

    statements reflecting societal issues and personal experiences related to social behaviors and norms

    New Auto-Interp
    Negative Logits
     itſelf
    -1.01
    InSection
    -1.00
     Jefus
    -0.93
     Majefty
    -0.91
     themſelves
    -0.87
     myſelf
    -0.87
    ſelf
    -0.85
     himſelf
    -0.84
     pleaſure
    -0.82
    ſelves
    -0.81
    POSITIVE LOGITS
     im
    0.48
    0.40
    İstinadlar
    0.40
     these
    0.39
     las
    0.38
    saraba
    0.38
    рост
    0.38
    "
    0.37
     an
    0.37
     "
    0.37
    Act Density 0.438%

    No Known Activations