INDEX
    Explanations

    instances where someone is blocked or unblocked on social media platforms like Twitter

    instances of the word "block" and its variations

    New Auto-Interp
    Negative Logits
     subp
    -0.73
     mortar
    -0.64
     warr
    -0.63
     vapor
    -0.62
     capsule
    -0.62
    ppa
    -0.60
    chief
    -0.59
     appropri
    -0.59
     princ
    -0.58
     poster
    -0.57
    POSITIVE LOGITS
    ances
    0.97
    ables
    0.94
    enged
    0.93
    ible
    0.89
    able
    0.87
    ãĥ¼ãĤ¯
    0.85
    hemy
    0.83
    ers
    0.80
    ement
    0.77
    zee
    0.76
    Act Density 0.026%

    No Known Activations