INDEX
    Explanations

    clothing items and colors

    New Auto-Interp
    Negative Logits
    0.59
    لع
    0.57
     Ги
    0.57
    enl
    0.56
     Международ
    0.56
    Bộ
    0.56
     Omphalodes
    0.56
     Nós
    0.55
     Concrete
    0.55
    trajectory
    0.55
    POSITIVE LOGITS
     ajuda
    0.69
     livros
    0.69
     tengo
    0.68
     dejar
    0.68
     tuvieron
    0.67
     mémoire
    0.66
     musia
    0.66
     fucking
    0.65
     menulis
    0.65
     satın
    0.64
    Act Density 0.001%

    No Known Activations