INDEX
    Explanations

    references to disguise and transformation

    New Auto-Interp
    Negative Logits
    ãng
    -0.19
     anale
    -0.19
    aises
    -0.19
    isay
    -0.18
    rysler
    -0.16
    anky
    -0.16
    jong
    -0.16
    ientos
    -0.15
    лÑıн
    -0.15
    anou
    -0.15
    POSITIVE LOGITS
     convinc
    0.18
     identity
    0.17
     adopted
    0.17
    .identity
    0.17
     convincing
    0.17
     persona
    0.17
     Identity
    0.17
     alter
    0.16
    identity
    0.15
     covering
    0.15
    Act Density 0.144%

    No Known Activations