INDEX
    Explanations

    content related to political and social discussions

    phrases related to significant political events or decisions

    New Auto-Interp
    Negative Logits
    anwhile
    -0.69
    )."
    -0.62
    ).[
    -0.57
     therefore
    -0.55
    '."
    -0.53
    .'"
    -0.53
     meanwhile
    -0.52
    .).
    -0.49
     however
    -0.49
    ]."
    -0.48
    POSITIVE LOGITS
     Canaver
    0.55
    Spoiler
    0.54
    FAQ
    0.49
     Spoiler
    0.48
    ensical
    0.47
     precon
    0.46
     unpre
    0.46
     JPM
    0.45
    ONY
    0.44
     trolling
    0.44
    Act Density 4.356%

    No Known Activations