INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Weak
    -0.07
     weak
    -0.07
     Prosper
    -0.06
     virtue
    -0.06
     Fig
    -0.06
     Exec
    -0.06
     overall
    -0.06
     road
    -0.06
     Çocuk
    -0.06
     Liu
    -0.06
    POSITIVE LOGITS
     condensed
    0.12
    KD
    0.07
    MD
    0.07
     espa
    0.07
    μιουργ
    0.06
     Labrador
    0.06
    0.06
    0.06
    0.06
     markdown
    0.06
    Act Density 0.002%

    No Known Activations