INDEX
    Explanations

    phrases indicating features, qualities, or items in a list

    phrases that list or enumerate examples or factors

    New Auto-Interp
    Negative Logits
    wan
    -0.69
    enary
    -0.69
    orem
    -0.68
    orse
    -0.66
    athing
    -0.65
    mit
    -0.60
    uers
    -0.60
    uni
    -0.60
    orship
    -0.59
    idates
    -0.59
    POSITIVE LOGITS
     namely
    0.89
     Firstly
    0.72
     notably
    0.63
    xual
    0.62
     including
    0.59
     viz
    0.57
    etsk
    0.56
     redund
    0.53
     includ
    0.53
     weddings
    0.53
    Act Density 0.266%

    No Known Activations