INDEX
    Explanations

    phrases expressing enjoyment, forgetfulness, and responsibility related to social issues and personal values

    New Auto-Interp
    Negative Logits
     purpoſe
    -0.96
     Majefty
    -0.91
     ſeveral
    -0.88
     myſelf
    -0.84
     ſtate
    -0.83
     themſelves
    -0.82
     neceff
    -0.82
     Monfieur
    -0.81
     ainfi
    -0.81
     auroit
    -0.81
    POSITIVE LOGITS
     also
    0.88
     the
    0.71
     others
    0.64
     anything
    0.62
     is
    0.61
     other
    0.61
     much
    0.60
     a
    0.60
     none
    0.60
     at
    0.59
    Act Density 0.636%

    No Known Activations