INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    which
    0.67
     которая
    0.67
     который
    0.63
     которое
    0.63
     позволит
    0.63
     which
    0.60
     možete
    0.59
     която
    0.58
     budete
    0.58
    你會
    0.56
    POSITIVE LOGITS
     themselves
    1.06
     are
    0.98
     ovat
    0.90
     هستند
    0.89
     have
    0.86
     were
    0.84
     являются
    0.83
     aren
    0.83
     हैं
    0.80
     eivät
    0.79
    Act Density 0.100%

    No Known Activations