INDEX
    Explanations

    actions and roles being exchanged or substituted among characters

    New Auto-Interp
    Negative Logits
    ungal
    -0.15
    Jon
    -0.14
    uzey
    -0.14
    woff
    -0.14
    nder
    -0.14
    ystore
    -0.13
    lc
    -0.13
    URE
    -0.13
     acquaint
    -0.13
    lection
    -0.13
    POSITIVE LOGITS
     replace
    0.31
     replacement
    0.28
    replace
    0.27
     replaced
    0.26
     replacing
    0.26
    replacement
    0.26
     replacements
    0.26
    代
    0.25
    .replace
    0.25
     replaces
    0.24
    Act Density 0.087%

    No Known Activations