INDEX
    Explanations

    phrases indicating universality or consistency across all entities

    phrases that indicate widespread or collective situations

    New Auto-Interp
    Negative Logits
     Siber
    -0.81
    uala
    -0.79
     Rout
    -0.67
     Berk
    -0.66
     Mub
    -0.65
    Ô
    -0.64
    eport
    -0.63
     Ads
    -0.63
     Frie
    -0.63
    anson
    -0.62
    POSITIVE LOGITS
     notch
    0.73
    stairs
    0.64
    lihood
    0.64
    etheless
    0.63
    isphere
    0.62
     improvement
    0.62
    rust
    0.60
     pathological
    0.59
    atics
    0.58
     equation
    0.57
    Act Density 0.054%

    No Known Activations