INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    izens
    -0.72
    cia
    -0.69
    may
    -0.68
    Corn
    -0.68
    WIND
    -0.66
    ESE
    -0.66
    CSS
    -0.65
    Aren
    -0.65
    cue
    -0.65
    soon
    -0.64
    POSITIVE LOGITS
     necessarily
    1.22
     relying
    1.11
     bothering
    1.06
     merely
    1.04
     letting
    0.95
     simply
    0.93
     outright
    0.92
     risking
    0.90
     focusing
    0.87
     allowing
    0.87
    Act Density 0.056%

    No Known Activations