INDEX
    Explanations

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
    IIIK
    -0.14
    ezi
    -0.14
    _Tis
    -0.14
    ckill
    -0.14
    _mB
    -0.14
    _tF
    -0.14
    gamber
    -0.13
    toi
    -0.13
     thereof
    -0.13
    igon
    -0.13
    POSITIVE LOGITS
    ers
    0.20
    ies
    0.20
    ism
    0.17
    our
    0.17
    ie
    0.16
    ory
    0.16
    ard
    0.16
    u
    0.16
    ler
    0.15
    ane
    0.15
    Act Density 0.238%

    No Known Activations