INDEX
    Explanations

    text written in a non-English language, specifically featuring the character "ä"

    instances of a specific character or symbol

    New Auto-Interp
    Negative Logits
    ORED
    -0.82
     Sussex
    -0.68
     Jericho
    -0.67
     Mayweather
    -0.66
    IFIED
    -0.66
     Bullets
    -0.63
     Hodg
    -0.60
    ################
    -0.59
     Asians
    -0.58
     Notting
    -0.58
    POSITIVE LOGITS
    ä
    1.27
    inen
    1.18
    ¢
    1.10
    ternity
    1.05
    ·
    0.96
    ¯¯¯¯
    0.93
    0.93
    ë
    0.91
    ï
    0.90
    ö
    0.88
    Act Density 0.012%

    No Known Activations