INDEX
    Explanations

    first-person pronouns and personal reflections

    New Auto-Interp
    Negative Logits
     Theſe
    -1.15
     Beſ
    -1.05
     Monfieur
    -0.98
     Efq
    -0.88
     Reſ
    -0.87
     ſeveral
    -0.86
     Anſ
    -0.84
     Eſ
    -0.84
     Diſ
    -0.82
     Padang
    -0.79
    POSITIVE LOGITS
     I
    1.95
    I
    1.49
     We
    1.25
     we
    1.24
     i
    1.12
    We
    1.09
     he
    0.96
    𝑰
    0.91
    tôi
    0.91
     He
    0.89
    Act Density 0.223%

    No Known Activations