INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     daba
    -0.95
     luce
    -0.94
     años
    -0.89
     cruz
    -0.85
     rusak
    -0.84
     italiano
    -0.84
    -0.83
     would
    -0.82
     кроме
    -0.82
    ССР
    -0.81
    POSITIVE LOGITS
    適量
    0.96
     acolo
    0.96
     and
    0.91
     aici
    0.89
     センター
    0.88
    vity
    0.87
     Decrease
    0.87
    tabria
    0.86
     gradation
    0.85
    verständ
    0.84
    Act Density 0.008%

    No Known Activations