INDEX
    Explanations

    phrases indicating reasons or motivations for actions

    phrases that specify causal relationships or conditions, particularly focusing on the word "because."

    New Auto-Interp
    Negative Logits
    ength
    -0.67
    )].
    -0.66
    mun
    -0.65
     Flavoring
    -0.62
     Il
    -0.61
     Strongh
    -0.60
    ©¶æ
    -0.60
    Home
    -0.59
    imeters
    -0.59
    creen
    -0.59
    POSITIVE LOGITS
     mention
    0.79
     slightest
    0.75
     versa
    0.69
     nor
    0.67
     anything
    0.66
     necessarily
    0.65
    icable
    0.65
     anywhere
    0.64
    ivable
    0.63
     anymore
    0.63
    Act Density 0.138%

    No Known Activations