INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     deterg
    -1.59
    'll
    -1.55
     done
    -1.50
     else
    -1.49
     hearts
    -1.47
    idem
    -1.47
    dust
    -1.47
    tics
    -1.43
     knees
    -1.42
     clean
    -1.41
    POSITIVE LOGITS
    ière
    1.67
    ização
    1.66
    chaft
    1.57
    izing
    1.51
    ulating
    1.50
    ulator
    1.47
    osecond
    1.47
    finder
    1.46
    uelle
    1.46
     magazine
    1.45
    Act Density 0.017%

    No Known Activations