INDEX
    Explanations

    phrases indicating legal or political contexts

    New Auto-Interp
    Negative Logits
     overall
    -0.15
     also
    -0.15
    403
    -0.14
     later
    -0.14
    Of
    -0.14
    μη
    -0.14
    ping
    -0.13
    later
    -0.13
    út
    -0.13
    ãĤ¡
    -0.13
    POSITIVE LOGITS
    Ñĥков
    0.15
    -Nazi
    0.14
    GMT
    0.14
    oller
    0.14
    oga
    0.14
    dın
    0.13
    ãĥ¼ãĥĦ
    0.13
    ISIBLE
    0.13
    steder
    0.13
    ÏģÏī
    0.13
    Act Density 0.064%

    No Known Activations