INDEX
    Explanations

    expressions of opinion or personal judgment

    phrases related to speech and expression of thoughts

    New Auto-Interp
    Negative Logits
    unal
    -0.63
    agascar
    -0.62
    pered
    -0.62
    aired
    -0.60
     Flavoring
    -0.59
    astern
    -0.58
    ockets
    -0.58
    actionDate
    -0.57
    agra
    -0.57
     recomm
    -0.56
    POSITIVE LOGITS
     '
    1.29
     '[
    1.28
     hey
    1.20
     "'
    1.19
     `
    1.14
     \"
    1.08
     '(
    1.07
     wow
    0.99
     "
    0.98
     hello
    0.97
    Act Density 0.148%

    No Known Activations