INDEX
    Explanations

    first-person pronouns indicating personal experiences or feelings

    New Auto-Interp
    Negative Logits
     Theſe
    -1.08
     Beſ
    -1.03
     Monfieur
    -0.90
     Efq
    -0.84
     Padang
    -0.83
     Reſ
    -0.81
     ſeveral
    -0.81
     Eſ
    -0.81
     themſelves
    -0.80
     CER
    -0.78
    POSITIVE LOGITS
     I
    2.06
    I
    1.61
     i
    1.21
     We
    1.11
     we
    1.05
    We
    0.99
     Me
    0.97
    𝗜
    0.97
     My
    0.96
    𝑰
    0.95
    Act Density 0.276%

    No Known Activations