INDEX
    Explanations

    text written in a language with special characters and accents

    instances of a specific character or symbol related to dialogue or subjects in text

    New Auto-Interp
    Negative Logits
     Hodg
    -0.67
     Jericho
    -0.63
     patched
    -0.63
    ORED
    -0.63
     kernels
    -0.62
     Mayweather
    -0.60
     immune
    -0.59
     Iro
    -0.58
     Asians
    -0.58
     proxies
    -0.58
    POSITIVE LOGITS
    inen
    1.39
    nder
    1.22
    ä
    1.15
    nen
    1.15
    tten
    1.11
    ternity
    1.04
    ¢
    1.00
    ng
    0.98
    ki
    0.96
    lde
    0.95
    Act Density 0.014%

    No Known Activations