INDEX
    Explanations

    phrases expressing understanding or disagreement

    expressions of doubt or uncertainty

    New Auto-Interp
    Negative Logits
    ortment
    -0.70
     unavoid
    -0.62
     unsus
    -0.62
     overcoming
    -0.61
     respectively
    -0.61
    vironment
    -0.61
    BIL
    -0.61
    atars
    -0.59
     awaited
    -0.58
     overcome
    -0.58
    POSITIVE LOGITS
     anymore
    1.04
     myself
    0.97
    yet
    0.85
     nor
    0.80
    poke
    0.78
     anybody
    0.72
     EVER
    0.71
    à¼
    0.69
     specifics
    0.67
     anywhere
    0.63
    Act Density 0.293%

    No Known Activations