INDEX
    Explanations

    double quotation marks

    quotations and punctuation marks indicating dialogue or speech

    New Auto-Interp
    Negative Logits
     Flavoring
    -0.80
     accomp
    -0.79
    inois
    -0.79
     hoard
    -0.75
     angered
    -0.74
     chair
    -0.73
     surpr
    -0.72
     behav
    -0.72
     Klu
    -0.71
     agre
    -0.71
    POSITIVE LOGITS
    BuyableInstoreAndOnline
    1.08
    Hello
    1.04
    false
    1.04
    Hey
    0.99
    normal
    0.98
    SELECT
    0.94
    hey
    0.94
    wcsstore
    0.92
    true
    0.92
    evil
    0.89
    Act Density 0.168%

    No Known Activations