INDEX
    Explanations

    invisible, hidden

    New Auto-Interp
    Negative Logits
    -0.08
    奋斗
    -0.08
     casc
    -0.07
     rustig
    -0.07
    ertjes
    -0.07
     marts
    -0.07
     Praise
    -0.07
     temples
    -0.07
    .Abs
    -0.07
    huizen
    -0.07
    POSITIVE LOGITS
    Blind
    0.10
    .decrypt
    0.10
     секрет
    0.10
    .encrypt
    0.10
    blind
    0.10
    Hidden
    0.10
     ultraviolet
    0.09
    .secret
    0.09
    hidden
    0.09
    Decrypt
    0.09
    Act Density 0.030%

    No Known Activations