INDEX
    Explanations

    phrases or sentences indicating a reason or justification

    instances of the word "because" indicating reasoning or justification

    New Auto-Interp
    Negative Logits
    wn
    -0.72
    jet
    -0.70
    pmwiki
    -0.68
    alysed
    -0.67
    iets
    -0.66
    whel
    -0.64
    ax
    -0.64
     exting
    -0.64
    ns
    -0.64
    moderate
    -0.64
    POSITIVE LOGITS
     otherwise
    1.05
     unlike
    0.95
     hey
    0.94
     nobody
    0.91
     there
    0.91
     frankly
    0.89
     it
    0.88
     they
    0.88
     obviously
    0.88
     we
    0.85
    Act Density 0.093%

    No Known Activations