INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ledge
    -0.17
    ounce
    -0.16
    wares
    -0.15
    ount
    -0.15
    ounced
    -0.14
    weep
    -0.14
    oned
    -0.14
    OUS
    -0.14
    omial
    -0.14
    Enumeration
    -0.14
    POSITIVE LOGITS
    idelberg
    0.33
    aven
    0.28
    fty
    0.28
    inz
    0.28
    arty
    0.28
    ads
    0.27
    ather
    0.27
    aviest
    0.27
    avier
    0.27
    brew
    0.26
    Act Density 0.038%

    No Known Activations