INDEX
    Explanations

    expressions of appreciation and positivity

    New Auto-Interp
    Negative Logits
     noDo
    -0.60
     BrowserModule
    -0.48
    Eminem
    -0.48
     myſelf
    -0.48
    rillation
    -0.48
    Battlefield
    -0.47
     Paglinawan
    -0.47
    ſelf
    -0.47
    ₁,
    -0.47
     Descartes
    -0.47
    POSITIVE LOGITS
    lovely
    0.92
     Lovely
    0.90
     lovely
    0.90
    Lovely
    0.87
     nice
    0.69
    Nice
    0.67
    nice
    0.67
    wonderful
    0.64
     Nice
    0.64
    Wonderful
    0.59
    Act Density 0.001%

    No Known Activations