INDEX
    Explanations

    instances where someone is expressing an opinion or statement

    New Auto-Interp
    Negative Logits
    =~=~
    -0.77
    ï¸ı
    -0.69
    ierrez
    -0.69
    ositories
    -0.68
    Redd
    -0.67
    utonium
    -0.65
    onut
    -0.64
    thia
    -0.64
    aint
    -0.62
    icut
    -0.62
    POSITIVE LOGITS
     goodbye
    1.25
     bye
    1.03
     aloud
    0.90
     hello
    0.84
     farewell
    0.75
     \"
    0.75
     Goodbye
    0.69
     loudly
    0.68
     sorry
    0.68
     publicly
    0.66
    Act Density 0.058%

    No Known Activations