INDEX
    Explanations

    instances of capitalization, likely focusing on proper nouns or significant terms

    New Auto-Interp
    Negative Logits
    nen
    -0.19
    li
    -0.18
    nes
    -0.17
    nees
    -0.17
    ss
    -0.17
    lo
    -0.17
    rd
    -0.17
    º
    -0.17
    rome
    -0.16
    loe
    -0.15
    POSITIVE LOGITS
    izabeth
    0.17
    SCO
    0.16
    Åijs
    0.15
    ichel
    0.15
    -disable
    0.15
    Ãłnh
    0.15
    /disable
    0.15
    erald
    0.14
    zzo
    0.14
    enis
    0.14
    Act Density 0.147%

    No Known Activations