INDEX
    Explanations

    equals sign

    New Auto-Interp
    Negative Logits
     plaintiffs
    -0.09
     Canadiens
    -0.09
    Friends
    -0.08
     senators
    -0.08
    ости
    -0.08
     garantie
    -0.07
     Curtain
    -0.07
     Senators
    -0.07
    -0.07
     Plaint
    -0.07
    POSITIVE LOGITS
     poka
    0.08
     jump
    0.08
     trabalha
    0.08
     PT
    0.07
     konf
    0.07
    ogg
    0.07
    c
    0.07
     sao
    0.07
     stark
    0.07
     almond
    0.07
    Act Density 0.022%

    No Known Activations