INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     над
    -0.96
     landes
    -0.76
     nene
    -0.74
    不利
    -0.71
     lieb
    -0.70
     vak
    -0.69
     née
    -0.68
     Δη
    -0.67
     Apalagi
    -0.67
     seriously
    -0.66
    POSITIVE LOGITS
    0.82
    comfortable
    0.79
     veiks
    0.77
    0.76
     flourished
    0.76
     бумаги
    0.75
    lofen
    0.75
    couvrir
    0.75
     poichè
    0.74
    0.74
    Act Density 0.055%

    No Known Activations