INDEX
    Explanations

    phrases indicating causality or conditionality

    New Auto-Interp
    Negative Logits
    yna
    -0.69
    elve
    -0.66
    CVE
    -0.65
     pione
    -0.63
    Virgin
    -0.63
    inois
    -0.63
    uty
    -0.63
    isively
    -0.62
    atl
    -0.61
    atri
    -0.61
    POSITIVE LOGITS
     they
    1.33
     THEY
    1.10
     something
    1.08
     someone
    1.06
     there
    1.04
     somebody
    1.02
     you
    1.01
     everything
    0.98
     it
    0.94
     theirs
    0.91
    Act Density 0.235%

    No Known Activations