INDEX
    Explanations

    mentions of significant events or accomplishments

    sentences that convey a strong emphasis or conclude statements

    New Auto-Interp
    Negative Logits
     pretended
    -0.76
     default
    -0.72
     gamb
    -0.69
     defaults
    -0.69
     split
    -0.68
     pse
    -0.68
     splits
    -0.68
     misdem
    -0.66
    iliated
    -0.66
    cheat
    -0.66
    POSITIVE LOGITS
     Moreover
    1.22
    <|endoftext|>
    1.20
     Additionally
    1.16
     Furthermore
    1.16
     However
    1.13
     Nevertheless
    1.12
     Indeed
    1.08
     Unfortunately
    1.08
     Accordingly
    1.07
     Nonetheless
    1.06
    Act Density 0.547%

    No Known Activations