INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    volves
    -0.89
    helping
    -0.86
    -0.85
     Allows
    -0.82
    ebly
    -0.82
    giving
    -0.80
    iczna
    -0.77
     tiveram
    -0.77
    emm
    -0.76
    そう
    -0.76
    POSITIVE LOGITS
     make
    4.41
     makes
    4.38
     making
    3.06
    Make
    3.05
    make
    3.05
     Makes
    3.03
    Makes
    2.97
     Make
    2.86
    makes
    2.77
    MAKE
    2.31
    Act Density 0.035%

    No Known Activations