INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     bp
    -0.07
     добре
    -0.06
    κι
    -0.06
    protein
    -0.06
    арх
    -0.06
     gener
    -0.06
    .squareup
    -0.06
    -how
    -0.06
    ок
    -0.06
    POSITIVE LOGITS
    电视
    0.07
    TOR
    0.06
    .conn
    0.06
    ificação
    0.06
     pagar
    0.06
    (TYPE
    0.06
    умент
    0.06
    unya
    0.06
     Wohn
    0.06
     Vu
    0.06
    Act Density 0.028%

    No Known Activations