INDEX
    Explanations

    phrases related to personal reflections and opinions

    expressions of strong emotions and reactions

    New Auto-Interp
    Negative Logits
    catentry
    -0.81
    senal
    -0.66
     spam
    -0.65
     boosting
    -0.63
    ikarp
    -0.63
    cific
    -0.62
    maxwell
    -0.61
     OG
    -0.61
     carbohyd
    -0.60
     keyword
    -0.60
    POSITIVE LOGITS
    ,—
    1.04
    ;
    0.90
    .—
    0.85
     lest
    0.80
    enance
    0.79
    ankind
    0.79
     truths
    0.78
    enment
    0.78
    !
    0.77
    ––
    0.76
    Act Density 0.513%

    No Known Activations