INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ucl
    -0.70
    Neigh
    -0.67
    upon
    -0.64
    STEP
    -0.63
    pher
    -0.62
    ulous
    -0.62
    itsch
    -0.62
    ocene
    -0.62
    Cooldown
    -0.61
    leased
    -0.60
    POSITIVE LOGITS
    cy
    0.99
    berries
    0.88
    cies
    0.86
    esses
    0.85
    berry
    0.83
     Kidd
    0.83
    istics
    0.81
    TAIN
    0.80
     Hook
    0.76
    autical
    0.74
    Act Density 0.021%

    No Known Activations