INDEX
    Explanations

    phrases indicating contrast or contradiction

    phrases indicating temporal contexts or factual assertions

    New Auto-Interp
    Negative Logits
     aspiration
    -0.60
     holdings
    -0.58
     unfocusedRange
    -0.57
    ucker
    -0.56
     Telegram
    -0.56
    qt
    -0.56
     subreddit
    -0.55
    legram
    -0.55
     Ludwig
    -0.55
    pherd
    -0.55
    POSITIVE LOGITS
    ean
    0.71
    irlf
    0.69
    udeb
    0.69
    fil
    0.67
    ealous
    0.64
    })
    0.63
    ©¶æ
    0.62
    azard
    0.62
    ķ
    0.61
    Spoiler
    0.59
    Act Density 0.352%

    No Known Activations