INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     by
    -1.59
     other
    -1.54
     these
    -1.42
     it
    -1.33
     each
    -1.32
     그리고
    -1.26
     both
    -1.23
    以及
    -1.20
     setiap
    -1.15
    好評
    -1.13
    POSITIVE LOGITS
    thenburg
    1.34
    1.32
     estavam
    1.19
     spiega
    1.16
     médec
    1.16
     sidste
    1.13
     parteci
    1.11
     começaram
    1.10
     gestes
    1.10
    1.09
    Act Density 0.014%

    No Known Activations