INDEX
    Explanations

    mentions of replacement or substitution

    instances of the word "replaced"

    New Auto-Interp
    Negative Logits
    Fight
    -0.76
    hawk
    -0.74
    eb
    -0.73
    NG
    -0.72
    raq
    -0.71
    Import
    -0.71
    emi
    -0.69
    Dream
    -0.68
    WAY
    -0.66
    apa
    -0.65
    POSITIVE LOGITS
     replaces
    0.91
     replaced
    0.88
     obsolete
    0.86
     replacing
    0.82
     replace
    0.81
    mentation
    0.80
    ãĥĺ
    0.78
     destro
    0.78
    mented
    0.78
     replacement
    0.77
    Act Density 0.012%

    No Known Activations