INDEX
    Explanations

    negations or phrases emphasizing the absence of something

    New Auto-Interp
    Negative Logits
    codehaus
    -0.83
    ässe
    -0.69
    Moose
    -0.68
    λους
    -0.68
     Trasp
    -0.66
    Datuak
    -0.65
    "`
    -0.65
     mxArray
    -0.65
    -0.65
    いらっしゃ
    -0.64
    POSITIVE LOGITS
     nothing
    1.65
    nothing
    1.64
    NOTHING
    1.62
     Nothing
    1.60
     NOTHING
    1.55
    Nothing
    1.54
     anything
    1.20
     Anything
    1.19
    anything
    1.19
     nothin
    1.17
    Act Density 0.052%

    No Known Activations