INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    hode
    -0.73
    arb
    -0.73
    Stair
    -0.69
    ラダ
    -0.67
     recrutement
    -0.64
    uza
    -0.63
    boost
    -0.63
     Packed
    -0.63
    aula
    -0.62
    从未
    -0.61
    POSITIVE LOGITS
     moments
    2.05
     moment
    1.77
     Moments
    1.76
    moment
    1.67
     Moment
    1.64
    moments
    1.57
    Moment
    1.57
    Moments
    1.47
     momenti
    1.32
     Momente
    1.30
    Act Density 0.049%

    No Known Activations