INDEX
    Explanations

    issues related to personal rights and freedoms

    New Auto-Interp
    Negative Logits
     FUCK
    -0.16
    Ïĩη
    -0.16
     fuck
    -0.15
     fucking
    -0.15
     shitty
    -0.15
     Fuck
    -0.15
     sorts
    -0.15
     sort
    -0.15
    Sorted
    -0.15
    ÑĩаÑĤ
    -0.14
    POSITIVE LOGITS
     um
    0.23
     uh
    0.23
    --
    0.22
     --
    0.22
     ya
    0.18
     sir
    0.17
     -
    0.17
     (
    0.16
    {}
    0.16
    --,
    0.15
    Act Density 0.075%

    No Known Activations