INDEX
    Explanations

    references to specific brands or products, particularly in the context of fashion and watches

    New Auto-Interp
    Negative Logits
    elden
    -0.19
    Ñıж
    -0.14
     zdrav
    -0.14
    ãĥ¼ãĥĵ
    -0.13
    ÙĪØ²Ùĩ
    -0.13
    -gnu
    -0.13
     ragaz
    -0.13
    Ù쨴
    -0.13
    ecz
    -0.13
    flutter
    -0.13
    POSITIVE LOGITS
     replica
    0.37
     Replica
    0.34
     fake
    0.30
     imitation
    0.30
     Cart
    0.30
     replicas
    0.29
     Fake
    0.28
     copy
    0.28
     ro
    0.28
    Rep
    0.27
    Act Density 0.007%

    No Known Activations