INDEX
    Explanations

    phrases that indicate warnings or calls to action regarding societal or systemic issues

    New Auto-Interp
    Negative Logits
    },'
    -0.15
    един
    -0.15
    gregar
    -0.14
    iri
    -0.14
     Welch
    -0.14
    ENU
    -0.14
    ohn
    -0.13
    ered
    -0.13
    æ£
    -0.13
    ëĭ´
    -0.13
    POSITIVE LOGITS
    roe
    0.16
     Ramp
    0.15
     scale
    0.15
     bod
    0.14
     Bod
    0.14
    Scale
    0.14
     Dispatch
    0.14
    Alternate
    0.14
    scale
    0.14
    än
    0.14
    Act Density 0.261%

    No Known Activations