INDEX
    Explanations

    expressions of ability or capability

    New Auto-Interp
    Negative Logits
     honors
    -0.18
     coloring
    -0.17
     Favorite
    -0.17
    Favorite
    -0.15
     modeled
    -0.15
    umor
    -0.15
     Flavor
    -0.15
     armored
    -0.15
     theater
    -0.15
     signaled
    -0.15
    POSITIVE LOGITS
    Liked
    0.15
     Democr
    0.15
    edImage
    0.15
     organisers
    0.15
     image
    0.14
    .imag
    0.14
    erator
    0.14
    awah
    0.14
    -image
    0.14
    719
    0.14
    Act Density 0.051%

    No Known Activations