INDEX
    Explanations

    phrases indicating contradictions or negative statements

    New Auto-Interp
    Negative Logits
    __':
    
    -0.72
    ագրություններ
    -0.68
    endphp
    -0.68
    ]")]
    -0.67
    >--}}
    -0.65
    InputBorder
    -0.65
    ```
    
    -0.65
    IntoConstraints
    -0.63
    SequentialGroup
    -0.61
    "]:
    -0.61
    POSITIVE LOGITS
    (!)
    0.88
    !!!
    0.80
     (!)
    0.80
    ?!?
    0.78
    !!!!!
    0.78
     freakin
    0.77
    !!!!!!
    0.77
    !!!!
    0.77
    ¡¡¡
    0.75
    !
    0.74
    Act Density 0.127%

    No Known Activations