INDEX
    Explanations

    comparisons between original and imitation items or works

    New Auto-Interp
    Negative Logits
    strup
    -0.15
    alo
    -0.14
    achen
    -0.13
    лоп
    -0.13
    thinkable
    -0.13
     miesz
    -0.13
    oux
    -0.13
    rve
    -0.13
    atra
    -0.13
    ople
    -0.12
    POSITIVE LOGITS
     original
    1.30
    original
    1.11
     originals
    1.02
     Original
    1.02
     ORIGINAL
    0.98
    -original
    0.96
    Original
    0.95
     originally
    0.94
    .original
    0.88
    åİŁ
    0.88
    Act Density 0.458%

    No Known Activations