INDEX
    Explanations

    phrases related to small examples or elements representing a larger concept

    references to significant issues or underlying problems

    New Auto-Interp
    Negative Logits
    chev
    -0.82
    iors
    -0.80
    unctions
    -0.75
    Daily
    -0.73
    ancies
    -0.72
    OWN
    -0.72
    LY
    -0.71
    ummies
    -0.69
    lords
    -0.67
    cler
    -0.66
    POSITIVE LOGITS
     iceberg
    1.39
     scale
    0.87
     spear
    0.82
     scales
    0.82
     rope
    0.78
     proverbial
    0.76
     wedge
    0.74
     finger
    0.74
     mustard
    0.72
     fingers
    0.67
    Act Density 0.096%

    No Known Activations