INDEX
    Explanations

    pronouns that refer to subjects or objects in sentences

    New Auto-Interp
    Negative Logits
     Bliss
    -0.14
    rics
    -0.14
     Strict
    -0.14
    loys
    -0.14
    apan
    -0.14
    éľŀ
    -0.14
    eland
    -0.14
    nock
    -0.13
    ît
    -0.13
    gd
    -0.13
    POSITIVE LOGITS
     am
    0.23
    ching
    0.19
     used
    0.19
     all
    0.18
    achi
    0.18
    AMI
    0.18
     boiling
    0.17
     ultimately
    0.17
     Takes
    0.17
     took
    0.17
    Act Density 0.137%

    No Known Activations