INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     energy
    -0.76
     Ihren
    -0.76
     remain
    -0.73
     detalla
    -0.73
    程度の
    -0.72
    akang
    -0.71
    ± 
    -0.70
    -0.69
     oč
    -0.69
    ahin
    -0.69
    POSITIVE LOGITS
     Dil
    1.28
     DIL
    1.20
    pickle
    1.16
     dil
    1.16
    Dil
    1.14
     Pickle
    1.13
    dil
    1.09
     pickle
    1.09
     dilu
    1.08
    Pickle
    1.07
    Act Density 0.020%

    No Known Activations