INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.67
    Un
    0.64
     one
    0.63
    0.62
     une
    0.60
    0.59
     Un
    0.56
    常常
    0.56
    P
    0.51
    0.51
    POSITIVE LOGITS
    <unused505>
    1.55
     FTIR
    1.51
     Chern
    1.49
    nico
    1.48
     Gerber
    1.48
    1.47
    <unused1667>
    1.46
     Bors
    1.46
     merch
    1.44
    ategoria
    1.43
    Act Density 0.435%

    No Known Activations