INDEX
    Explanations

    phrases emphasizing negation or denial

    New Auto-Interp
    Negative Logits
     attente
    -0.67
    addObject
    -0.67
    vidia
    -0.66
    ionage
    -0.66
    orthand
    -0.65
    epiece
    -0.65
    stdc
    -0.64
    Descripció
    -0.64
    ạnh
    -0.62
    Pozdrawiam
    -0.62
    POSITIVE LOGITS
     never
    2.89
    never
    2.67
     Never
    2.66
    Never
    2.61
     NEVER
    2.53
    NEVER
    2.41
     nunca
    1.94
     Nunca
    1.92
    Nunca
    1.85
    nunca
    1.75
    Act Density 0.047%

    No Known Activations