INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    dar
    -0.07
     qed
    -0.06
     enlightenment
    -0.06
     contradictory
    -0.06
     attachment
    -0.06
    intersect
    -0.06
     mirror
    -0.06
     dec
    -0.06
     vids
    -0.06
     Peng
    -0.06
    POSITIVE LOGITS
    ++++++++++++++++
    0.07
     costumes
    0.07
     Costa
    0.07
    Cost
    0.07
    στα
    0.07
     "%
    0.07
     Costume
    0.07
    0.07
     costume
    0.07
    este
    0.07
    Act Density 0.005%

    No Known Activations