INDEX
    Explanations

    causative terms, indicating actions leading to certain consequences

    phrases indicating causal relationships or changes in society

    New Auto-Interp
    Negative Logits
     Moss
    -0.66
     Owens
    -0.64
    urden
    -0.63
     moss
    -0.61
     Ware
    -0.61
     Pool
    -0.61
     Winc
    -0.60
     Licensed
    -0.60
     Monitor
    -0.59
    Cola
    -0.59
    POSITIVE LOGITS
     revolutions
    0.86
     havoc
    0.80
    MpServer
    0.76
    EStream
    0.74
    riots
    0.70
    uate
    0.70
    ocument
    0.70
    choes
    0.69
    Discover
    0.68
    versible
    0.67
    Act Density 0.044%

    No Known Activations