INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (es
    -0.08
     rubber
    -0.08
    Rede
    -0.08
     Lia
    -0.08
    Shar
    -0.08
     pavement
    -0.07
     ego
    -0.07
    kf
    -0.07
    psz
    -0.07
     tow
    -0.07
    POSITIVE LOGITS
    works
    0.10
    logged
    0.10
     vapor
    0.09
    bank
    0.08
    (水
    0.08
     marinade
    0.07
     Coral
    0.07
    borne
    0.07
     curing
    0.07
    melon
    0.07
    Act Density 0.043%

    No Known Activations