INDEX
    Explanations

    negations or instances of "won't."

    New Auto-Interp
    Negative Logits
    gypt
    -0.71
    eki
    -0.63
    illin
    -0.62
    OTOS
    -0.61
    Factor
    -0.60
     periphery
    -0.60
     Traps
    -0.59
    bian
    -0.58
     compr
    -0.57
     constrained
    -0.57
    POSITIVE LOGITS
    't
    1.43
    itive
    1.08
    cest
    0.95
    now
    0.94
    iors
    0.85
    geon
    0.83
    ced
    0.82
    ests
    0.81
    ipeg
    0.81
    cing
    0.81
    Act Density 0.036%

    No Known Activations