INDEX
    Explanations

    phrases that emphasize positivity or high quality adjectives

    New Auto-Interp
    Negative Logits
     ton
    -0.22
    -ton
    -0.18
    ugo
    -0.16
     Ton
    -0.16
     swath
    -0.16
     tons
    -0.16
     sampling
    -0.16
    oyer
    -0.16
    ton
    -0.15
     needed
    -0.15
    POSITIVE LOGITS
     cracking
    0.26
    Joined
    0.20
     range
    0.20
     emot
    0.19
     subsid
    0.18
    intree
    0.18
     contrib
    0.18
     raft
    0.18
     rethink
    0.18
     advert
    0.18
    Act Density 0.318%

    No Known Activations