INDEX
    Explanations

    prepositions or conjunctions at the beginning of a phrase followed by a strong sentiment or action term towards the end of the phrase

    concepts related to cause-and-effect relationships or outcomes

    New Auto-Interp
    Negative Logits
    ounding
    -0.85
    abs
    -0.74
    isible
    -0.71
    vana
    -0.70
    egu
    -0.70
    ét
    -0.69
    ums
    -0.68
    imens
    -0.68
    aeda
    -0.68
    ophy
    -0.68
    POSITIVE LOGITS
     they
    0.99
     it
    0.84
     opted
    0.82
     reverted
    0.80
     we
    0.79
     forgot
    0.79
     he
    0.77
     became
    0.75
     there
    0.75
     manages
    0.75
    Act Density 0.466%

    No Known Activations