INDEX
    Explanations

    phrases including the name of a person or organization

    instances of commas or punctuation in lists

    New Auto-Interp
    Negative Logits
    Tokens
    -0.69
    ");
    -0.65
    ²¾
    -0.63
    Reward
    -0.63
    abolic
    -0.62
    ');
    -0.60
    aceae
    -0.60
    aughs
    -0.59
    worldly
    -0.58
    tsy
    -0.58
    POSITIVE LOGITS
     meanwhile
    2.16
     however
    1.72
     moreover
    1.45
     meantime
    1.25
     likewise
    1.19
     though
    1.12
     incidentally
    1.11
     furthermore
    1.10
     unsurprisingly
    1.06
     therefore
    1.04
    Act Density 0.212%

    No Known Activations