INDEX
    Explanations

    specific names, potentially foreign, with characters like 'Ã' and 'ĸ'

    instances of certain characters or names in text

    New Auto-Interp
    Negative Logits
     Indigo
    -0.77
     Cassidy
    -0.74
     Annex
    -0.68
    itute
    -0.67
     Lesbian
    -0.66
     Indianapolis
    -0.66
     situ
    -0.66
    OWER
    -0.64
     Crus
    -0.64
     arts
    -0.63
    POSITIVE LOGITS
     Ãĸ
    1.14
    nder
    0.92
    sten
    0.89
    istani
    0.89
    yip
    0.86
    uria
    0.84
    ön
    0.82
     Gö
    0.81
    oÄŁan
    0.80
    thal
    0.79
    Act Density 0.005%

    No Known Activations