INDEX
    Explanations

    Follows "of" and precedes a pronoun

    of [determiner/pronoun]

    New Auto-Interp
    Negative Logits
     pleaſure
    -0.88
     purpoſe
    -0.86
     itſelf
    -0.86
     himſelf
    -0.85
     raiſ
    -0.82
     fhort
    -0.82
     reaſon
    -0.80
     Houſe
    -0.79
     cauſe
    -0.79
     themſelves
    -0.77
    POSITIVE LOGITS
     us
    0.83
     the
    0.79
     these
    0.70
     them
    0.64
     it
    0.64
     their
    0.63
     his
    0.61
     its
    0.60
     those
    0.57
     our
    0.57
    Act Density 0.194%

    No Known Activations