INDEX
    Explanations

    negative actions / crime

    New Auto-Interp
    Negative Logits
    last
    -0.07
     πάνω
    -0.06
    (Card
    -0.06
     bons
    -0.06
     happiest
    -0.06
     Rick
    -0.06
    -0.06
     Breakfast
    -0.06
    ах
    -0.06
     ทำ
    -0.06
    POSITIVE LOGITS
     Sco
    0.06
    ometry
    0.06
     Against
    0.06
    인은
    0.06
    utom
    0.06
    functions
    0.06
    br
    0.06
    도가
    0.06
    르는
    0.06
    serie
    0.06
    Act Density 0.141%

    No Known Activations