INDEX
    Explanations

    phrases related to arguments or justifications

    the word "that" in various contexts

    New Auto-Interp
    Negative Logits
    Guard
    -0.71
    Plus
    -0.68
    Bonus
    -0.67
    EMBER
    -0.67
    gments
    -0.66
    guard
    -0.65
    Laughs
    -0.62
    RIP
    -0.62
    AND
    -0.61
    wn
    -0.61
    POSITIVE LOGITS
     they
    0.75
     although
    0.75
     "[
    0.74
    cher
    0.68
     prevailed
    0.67
    eday
    0.65
     misunder
    0.65
     "...
    0.65
     there
    0.64
     it
    0.63
    Act Density 0.196%

    No Known Activations