INDEX
    Explanations

    phrases related to consequences or predictions

    phrases related to negative consequences or warnings

    New Auto-Interp
    Negative Logits
     reception
    -0.64
     Paran
    -0.63
     Malays
    -0.62
     Lithuan
    -0.62
     Lindsey
    -0.62
     Ivy
    -0.61
     Malaysian
    -0.61
     Monteneg
    -0.60
    htaking
    -0.58
     Zika
    -0.58
    POSITIVE LOGITS
     automatically
    0.86
     suddenly
    0.79
    Ł
    0.76
     indistinguishable
    0.74
    İ
    0.71
     disappears
    0.70
    Untitled
    0.70
     spontaneously
    0.70
     becomes
    0.70
     magically
    0.69
    Act Density 0.564%

    No Known Activations