INDEX
    Explanations

    words with accents and diacritics

    New Auto-Interp
    Negative Logits
     mileage
    -0.62
    ************
    -0.61
     masturb
    -0.60
     Kimmel
    -0.60
     Mandela
    -0.59
     thumbs
    -0.58
     Loll
    -0.58
     consent
    -0.58
     counting
    -0.57
     Shed
    -0.56
    POSITIVE LOGITS
    ão
    1.23
    ĩ
    1.17
    oÄŁ
    1.11
    İ
    1.05
    Ģ
    1.02
    oise
    0.99
    ais
    0.97
    ional
    0.96
    ĭ
    0.95
    į
    0.92
    Act Density 0.030%

    No Known Activations