INDEX
    Explanations

    instances of people being criticized or attacked for various reasons

    the word "for" in various contexts, indicating a focus on prepositional phrases

    New Auto-Interp
    Negative Logits
    illin
    -0.82
    edin
    -0.72
    atl
    -0.71
    Mine
    -0.67
    awan
    -0.66
     ®
    -0.64
    NET
    -0.64
    abo
    -0.62
    mare
    -0.61
    nan
    -0.61
    POSITIVE LOGITS
    geries
    1.10
     daring
    1.03
     violating
    1.02
     failing
    1.00
     lack
    0.95
     reasons
    0.94
     refusing
    0.93
     questioning
    0.89
    gery
    0.88
     breaching
    0.86
    Act Density 0.144%

    No Known Activations