INDEX
    Explanations

    phrases indicating criticism or nuanced evaluations of behavior

    "At" followed by superlative adjectives

    at least, at worst, at best

    New Auto-Interp
    Negative Logits
    ako
    -0.47
    czy
    -0.45
    ÉM
    -0.44
    tung
    -0.44
    ak
    -0.43
    tyd
    -0.42
    FORME
    -0.41
    Kjelder
    -0.41
    udy
    -0.40
    iNdEx
    -0.40
    POSITIVE LOGITS
     atleast
    1.18
     best
    1.07
     höch
    1.03
     almeno
    1.01
    best
    0.99
     макси
    0.99
     máximo
    0.99
    至少
    0.98
     worst
    0.97
    worst
    0.96
    Act Density 0.210%

    No Known Activations