INDEX
    Explanations

    questions or statements related to government policies and freedom of expression

    New Auto-Interp
    Negative Logits
    <bos>
    -0.56
    Itz
    -0.50
     forbear
    -0.46
     poc
    -0.45
     dora
    -0.45
     gero
    -0.45
     plagio
    -0.44
     ete
    -0.44
     Galer
    -0.43
     usus
    -0.43
    POSITIVE LOGITS
    otheby
    0.65
    ">...
    0.64
    hastly
    0.63
    USTAIN
    0.63
    oarece
    0.61
     anymore
    0.60
    dyž
    0.60
     venuto
    0.60
     sentito
    0.58
     statunit
    0.57
    Act Density 0.422%

    No Known Activations