INDEX
    Explanations

    don't be something negative

    New Auto-Interp
    Negative Logits
     that
    -2.36
     it
    -2.31
    -2.11
    过年
    -1.95
    -1.92
    -1.85
    -1.81
     veicolo
    -1.81
    when
    -1.80
    ではでは
    -1.77
    POSITIVE LOGITS
     هیچ
    1.73
    9
    1.57
     fri
    1.55
    はいけない
    1.54
     بعض
    1.51
     những
    1.49
    1.48
    <bos>
    1.46
    })=\
    1.45
     chill
    1.45
    Act Density 0.007%

    No Known Activations