INDEX
    Explanations

    police and associated actions

    New Auto-Interp
    Negative Logits
    i
    0.85
    و
    0.80
    the
    0.75
     is
    0.74
    nější
    0.74
    to
    0.73
    r
    0.73
    0.71
     νέ
    0.68
     zwią
    0.68
    POSITIVE LOGITS
    0.80
     you
    0.76
    0.75
    0.71
    0.71
    им
    0.69
    ο
    0.69
    א
    0.67
    об
    0.64
    ти
    0.63
    Act Density 0.003%

    No Known Activations