INDEX
    Explanations

    proper nouns, particularly names of individuals and titles

    New Auto-Interp
    Negative Logits
     in
    -0.18
     D
    -0.17
     T
    -0.16
     H
    -0.16
     C
    -0.16
     B
    -0.16
     
    -0.15
     from
    -0.15
     with
    -0.15
     to
    -0.15
    POSITIVE LOGITS
    in
    0.19
    ar
    0.17
    any
    0.17
    an
    0.17
    on
    0.17
    ina
    0.17
    it
    0.17
    us
    0.16
    im
    0.16
    is
    0.16
    Act Density 0.510%

    No Known Activations