INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Reviewer
    -0.84
    Interstitial
    -0.72
     gifted
    -0.70
    speaking
    -0.68
     thirsty
    -0.67
    atem
    -0.65
     Ital
    -0.64
     juggling
    -0.64
    desc
    -0.63
    ographed
    -0.61
    POSITIVE LOGITS
     2015
    0.90
     2017
    0.88
     2021
    0.88
     2016
    0.84
     Aug
    0.83
    onna
    0.83
     2025
    0.81
     2011
    0.81
     2019
    0.81
    nard
    0.81
    Act Density 0.090%

    No Known Activations