INDEX
    Explanations

    mentions of winning or achieving success in competitions or games

    phrases that indicate completeness or totality

    New Auto-Interp
    Negative Logits
    grad
    -0.68
    aminer
    -0.56
     EVERY
    -0.56
    stone
    -0.56
    robe
    -0.56
     Malf
    -0.55
    lav
    -0.55
     sometimes
    -0.54
     lad
    -0.52
    plin
    -0.51
    POSITIVE LOGITS
    ocating
    1.04
    usions
    0.97
    uding
    0.94
     three
    0.92
    igator
    0.92
    udes
    0.89
    ocation
    0.88
     four
    0.88
    ogene
    0.88
    ocations
    0.84
    Act Density 0.106%

    No Known Activations