INDEX
    Explanations

    phrases or terms indicating reasons and justifications for actions or opinions

    New Auto-Interp
    Negative Logits
    aro
    -0.14
    forme
    -0.14
    BERT
    -0.14
    :^
    -0.14
    bert
    -0.13
    egg
    -0.13
    aan
    -0.13
    eg
    -0.13
    rts
    -0.13
    اÙĦØ©
    -0.13
    POSITIVE LOGITS
     sake
    0.58
     purposes
    0.52
     purpose
    0.27
     reasons
    0.26
    pur
    0.21
    purpose
    0.21
     PURPOSE
    0.20
    Purpose
    0.20
     reason
    0.18
    _REASON
    0.18
    Act Density 1.274%

    No Known Activations