INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    +
    0.51
    Template
    0.49
    乃至
    0.46
    ди
    0.45
    For
    0.45
    Gest
    0.45
     or
    0.44
    Con
    0.44
    Learn
    0.44
     templates
    0.43
    POSITIVE LOGITS
     soooo
    0.52
     🙂
    0.52
    😊
    0.50
     😁
    0.49
     😀
    0.48
    🙂
    0.47
    😁
    0.47
     !!!!
    0.47
     😊
    0.46
     sooo
    0.46
    Act Density 0.000%

    No Known Activations